[PDF] Applications of Deep Learning in Fundus Images: A Review

Abstract

The use of fundus images for the early screening of eye diseases is of great clinical importance. Due to its powerful performance, deep learning is becoming more and more popular in related applications, such as lesion segmentation, biomarkers segmentation, disease diagnosis and image synthesis. Therefore, it is very necessary to summarize the recent developments in deep learning for fundus images with a review paper. In this review, we introduce 143 application papers with a carefully designed hierarchy. Moreover, 33 publicly available datasets are presented. Summaries and analyses are provided for each task. Finally, limitations common to all tasks are revealed and possible solutions are given. We will also release and regularly update the state-of-the-art results and newly-released datasets at this https URL Review to adapt to the rapid development of this field.

Full PDF

AApplications of Deep Learning in Fundus Images: A Review

Tao Li a , Wang Bo a , Chunyu Hu a , Hong Kang a , Hanruo Liu b , Kai Wang a, ∗ , Huazhu Fu c a College of Computer Science, Nankai University, Tianjin 300350, China b Beijing Tongren Hospital, Capital Medical University, Address, Beijing 100730 China c Inception Institute of Artiﬁcial Intelligence (IIAI), Abu Dhabi, UAE

A B S T R A C TThe use of fundus images for the early screening of eye diseases is of great clinical importance. Due to its powerful performance,deep learning is becoming more and more popular in related applications, such as lesion segmentation, biomarkers segmentation,disease diagnosis and image synthesis. Therefore, it is very necessary to summarize the recent developments in deep learningfor fundus images with a review paper. In this review, we introduce 143 application papers with a carefully designed hierarchy.Moreover, 33 publicly available datasets are presented. Summaries and analyses are provided for each task. Finally, limitationscommon to all tasks are revealed and possible solutions are given. We will also release and regularly update the state-of-the-artresults and newly-released datasets at https: // github.com / nkicsl / Fundus Review to adapt to the rapid development of this ﬁeld.

1. Introduction

According to the World Vision Report released by theWorld Health Organization in October 2019, more than 418million people worldwide su ﬀ er from glaucoma, diabeticretinopathy (DR), age-related macular degeneration (AMD)or other eye diseases which can cause blindness. Patients witheye diseases are often unaware of the aggravation of asymp-tomatic conditions (Robinson, 2003), so early screening andtreatment of eye diseases is particularly important.A fundus image is a projection of the fundus captured bya monocular camera on a 2D plane. Unlike other eye scans,such as OCT images and angiographs, fundus images can beacquired in a non-invasive and cost-e ﬀ ective manner, mak-ing them more suitable for large-scale screening (Edupugantiet al., 2018). An example of a fundus image is presented inFig. 1.Many important biomarkers can be seen in the fundus im-age, such as optic disc (OD), optic cup (OC), macula, fovea,blood vessel, and some DR related lesions, such as microa-neurysms (MAs), hemorrhages (HEs), hard exudates (EXs),and soft exudates (SEs). Fundus images can be used to diag-nose a variety of eye diseases, including glaucoma, DR, AMD,cataract, retinopathy of prematurity (ROP), and diabetic mac-ular edema (DME).Recently, data-driven deep learning has been widely ap-plied to ophthalmic disease diagnosis based on fundus im- ∗ Corresponding author: Tel.: + + https: // / publications / i / item / world-report-on-vision OpticDisc Vessel Soft ExudatesHemorrhages MaculaFovea Hard ExudatesMicroaneurysmsOptic Cup

Fig. 1. A fundus image from the IDRiD dataset illustrating importantbiomarkers and lesions. ages. Compared to traditional methods that use manually de-signed features, deep learning models can achieve better per-formance by automatically optimizing the features in an end-to-end manner. Most applications of deep learning in fundusimages can be coarsely divided into classiﬁcation, segmenta-tion and synthesis tasks. For brevity, we only list widely usedbackbones in fundus image applications. Diagnosis and grad-ing of ophthalmic diseases are two examples of classiﬁcationtasks, VGG-Net (Simonyan and Zisserman, 2015), Inception(Szegedy et al., 2015, 2016, 2017), ResNet (He et al., 2016)and DenseNet (Huang et al., 2017) are the most widely usedclassiﬁcation backbone networks. In terms of segmentationtasks, identifying lesions and biomarkers is of great impor-tance in the diagnosis of diseases. In addition to those used a r X i v : . [ ee ss . I V ] J a n N u m b e r o f P a p e r s

102 129 125 173 200 335 571 912

Application Papers Change per Year

Fig. 2. Number of papers on fundus images and deep learning in recentyears. for classiﬁcation, other networks widely used for segmenta-tion in fundus images include FCN (Long et al., 2015), Seg-Net (Badrinarayanan et al., 2017), U-Net (Ronneberger et al.,2015), MaskRCNN (He et al., 2017) and DeeplabV3 + (Chenet al., 2018). Finally, in the ﬁeld of fundus image synthe-sis, generative adversarial network (GANs) (Goodfellow et al.,2014) are the dominant architecture. Motivation.

The results in Fig. 2 show that the number ofpapers on fundus images and deep learning are increasing yearby year. While several review papers already exist for these,they are all di ﬀ erent from our review. For instance, Abramo ﬀ et al. (2010) and Zhang et al. (2014) only focus on classicalmachine learning methods; Salamat et al. (2019), Kanse andYadav (2019) and Moccia et al. (2018) only consider speciﬁcindividual diseases, such as DR or glaucoma; and Schmidt-Erfurth et al. (2018), Rahimy and Ehsan (2018) and Ting et al.(2018a,b, 2019) do not discuss speciﬁc deep learning methodsor structures, but instead simply use “a deep learning method”and similar terms to refer to all methods. Therefore, it is nec-essary to provide a high-quality review that analyzes the trendsand highlights the future directions for the applications of deeplearning in fundus images. Data.

In this review, we focus on the successful applicationof deep learning methods in fundus images from January 2016to August 2020. We collected 143 papers from the

DBLP , ScienceDirect , JAMA Network , Investigative Ophthalmol-ogy & Visual Science and Web of Science databases usingthe following keywords: retina , fundus , diabetes retinopathy , glaucoma , age-related macular degeneration , cataract , reti-nal vessel , optic disc / disk / cup , fundus / retinal + lesion / abnormal , hemorrhage , microaneurysm , exudate , neovascu-larization , drusen , fundus / retinal + synthesis / enhance , fun-dus / retinal + hypertension / stroke , fundus / retinal + kidney / brain / heart , and fundus / retinal + cardiovascular / cere- https: // dblp.uni-trier.de / db / https: // / https: // jamanetwork.com / http: // iovs.arvojournals.org / http: // apps.webofknowledge.com / HEs1.4% MAs1.4% Exudates1.4% Drusen0.7% Multiple lesions4.9%Vessel20.3%OD/OC12.6%Fovea 0.7%A/V 1.4%DR 16.8%Glaucoma9.8%Cataracts1.4% AMD5.6% ROP2.1% Multiple diseases4.2% DME0.7% Image synthesis7.7% Ophthalmic diseases1.4% Systemic diseases2.8% Image processing2.8%

Number of Papers per Task

Fig. 3. The distribution of papers per task. brovascular .The conference sources include

CVPR , AAAI , MICCAI , ISBI , ICMLA , and

ICIP , and the journal sources include

IEEETIP , IEEE TMI , MIA , JAMA , JAMA Ophthalmology , Ophthal-mology , Investigative Ophthalmology & Visual Science , Di-abetes Care , Nature Biomedical Engineering , IEEE TBME , IEEE JBHI , Pattern Recognition , Neural Networks , Informa-tion Science , Knowledge Based Systems , Expert Systems withApplications , Neurocomputing and

Future Generation Com-puter Systems . The distribution of papers per task is shown inFig. 3. Distributions of papers per year / source are shown inFig. 4.Two recent reviews, Badar et al. (2020) and Sengupta et al.(2020), are similar to ours in terms of view (fundus image,not ophthalmology), style (technical, not clinical), and method(deep learning, not artiﬁcial intelligence or machine learning).However, only 34 papers and 62 papers are reviewed in each,respectively, while we review 143 papers. Further, as shown inFig. 5, the scopes are also di ﬀ erent. Finally, as shown in Fig.6, this review utilizes a carefully designed multi-layer hierar-chy to organize the related works in a more intuitive manner. Contributions.

First , we give a comprehensive review ofthe applications of deep learning in fundus images. Comparedto recent works, this review covers more recent papers, moreeye diseases and more challenging tasks, especially includingimage synthesis and several interesting applications in Section6.

Second , we carefully design the taxonomy of our paper. Aknowledge graph is summarized in Fig. 6. The lookup tablefor the references in the knowledge graph is presented in the

Appendix . This can help readers to quickly ﬁnd content ofinterest.

Third , summaries and analyses are provided for eachtask. Limitations that are common to current approaches arealso described and possible solutions given in Section 7. Thismay provide inspiring ideas for researchers in this ﬁeld.

2. Lesion detection / segmentation In this section we will review how deep learning meth-ods have been applied to lesion detection / segmentation. Thewidely used datasets for this task are shown in Tab. 1. The availability column is linkable in the soft copy. Because of Y e a r

27 5439158

Number of Papers per Year (a) Papers per year

MICCAI21.7%ISBI15.4%TMI 9.8%JAMA Ophthalmology 7.0%ICIP 7.0% Neurocomputing 6.3%MIA 4.9% Others28.0%

Number of Papers per Source (b) Papers per source

Fig. 4. The distributions of papers per year and source.

Ours Badar et al.Senguputa et al.Sengputa et al.

ROP:2018-2019DME:2018-2019 A/V Classiﬁcation:2019-2020Ischmic Stroke:2019Diagnosis of multiple dieases:2018-2020Cataracts:2020 A/V Classiﬁcation:2018ROP:2016Multi-diease diagnosis:2017Vessel Seg:2016-2017Vessel Seg:2018Vessel Seg:2019-2020 OD/OC Seg:2017OD/OC Seg:2015OD/OC Seg:2016&2018OD/OC Seg:2019-2020 DR:2014-2015DR:2016-2017DR:2018DR:2019-2020 AMD:2016-2017AMD:2018-2019 Exudate:2016-2017Multiple Lesion:2017-2018Multiple Lesions:2019 Microaneurysms:2015-2016Microaneurysms:2018-2019 Hemorrhage:2016DME:2018 Vessel Seg:2015Glaucoma:2019-2020 Glaucoma:2016&2018 Glaucoma:2017Glaucoma:2015Drusen:2018Exudates:2018, 2020 Hemorrhage:2020Image synthesis :2017-2019Fovea Seg:2017 Pathological myopia:2020Refractive error:2017Cardiovascular risk factor:2018 Image registration:2020Image quality assessment:2019-2020Biological age estimation:2019 Cardiac vessel seg:2019Interperbility:2018-2020Image enhancement: 2019

Fig. 5. Comparison with two recent reviews similar in view, style and year. the correlation between lesion detection / segmentation and DRdiagnosis, there is an overlap between datasets used for thetwo. The DIARETDB0 dataset (Kauppi et al., 2006) consistsof 130 images, of which 110 contain signs of DR (EXs, SEs,MAs, HEs and neovascularization) and 20 are normal. DI-ARETDB1 (Kauppi et al., 2007) consists of 89 images, ofwhich 84 contain at least mild non-proliferative signs of DR(MA) and ﬁve are normal. The RC-RGB-MA dataset (Dasht-bozorg et al., 2018) contains 250 images. MAs were annotatedby two experts at bounding-box level. Images in the RC-SLO-MA dataset (Dashtbozorg et al., 2018) were captured usingscanning laser ophthalmoscopy (SLO). The dataset contains58 images with MA labels. The Retinopathy Online Chal-lenge (ROC) dataset (Niemeijer et al., 2010) consists of 100images that have been divided into a training set and a testset, both containing 50 images. Center locations of MAs arelabeled by experts. The e-ophtha dataset can be divided intotwo subsets, namely e-ophtha EX and e-ophtha MA. E-ophthaEX (Decenciere et al., 2013) provides pixel-level labels for EXsegmentation. It consists of 47 images with exudates and 35 with no lesions. E-ophtha MA (Decenciere et al., 2013) con-sists of 148 images with MAs or small HEs and 233 healthyimages. The Messidor dataset (Decenciere et al., 2014) con-sists of 1,200 images obtained from three ophthalmologic de-partments. 540 images are normal and 660 images are ab-normal. Messidor is divided into three sets, one per depart-ment, with di ﬀ erent resolutions. 800 images were acquiredwith pupil dilation and 400 without. Messidor-2 (Abr`amo ﬀ et al., 2013) extended Messidor to 1,748 images. Unlike Mes-sidor, images of Messidor-2 are all in pairs. The CLEOPA-TRA dataset (Sivaprasad et al., 2014) consists of 298 imagesobtained from 15 hospitals in the UK. It was acquired by dif-ferent fundus cameras. Therefore, the images have di ﬀ erentresolutions. Two experts were invited to annotate the groundtruths for EXs, HEs and MAs. The ﬁrst expert marked all im-ages and the second marked 135 images. CLEOPATRA is notavailable online. The two names Kaggle and EyePACS (Cali-fornia Healthcare Foundation, 2015) refer to the same datasetwhich was provided by EyePACS and used in the “DiabeticRetinopathy Detection–Identify signs of diabetic retinopathyin eye images” Kaggle competition. Kaggle dataset consistsof 35,126 training images graded into ﬁve DR stages and53,576 test images of undisclosed stages. Images in the Kag-gle dataset were obtained using multiple fundus cameras withdi ﬀ ernt ﬁleds of view. The IDRiD dataset (Porwal et al., 2018)was used in the “Diabetic Retinopathy: Segmentation andGrading Challenge” held by ISBI in 2018. It consists of threetasks, namely segmentation, disease grading and localization,with o ﬃ cial training and test sets provided. The segmenta-tion task consists of 81 images with ground truths providedfor lesions (MAs, HEs, EXs, SEs) and OD areas. The dis-ease grading task consists of 516 images with severity gradefor DR and DME. The localization task also consists of 516images, with annotations for OD and fovea center localiza-tion. Note that images in IDRiD have relatively high resolu-tion. The DDR dataset (Li et al., 2019b) consists of 13,673images which were obtained from 147 hospitals, covering 23provinces in China. Image level annotations with ﬁve classesof DR severity are provided for all images. In addition, 757images are provided with pixel-level and bounding-box-levelannotations for lesions (MAs, EXs, SEs and HEs).Experimental results for lesion segmentation on the variousdatasets introduced in this section are provided in Tab. 2, 3, 4and 5. Hemorrhages (HEs) are one of the visible pathologicalsigns of DR. Accurate detection or segmentation of HEs isimportant for DR diagnosis. In the task of lesion detec-tion / segmentation, patch-based methods are quite popular be-cause of the limited number of images in datasets and the needto reduce computational costs. Patch-based methods can gen-erate tens of thousands of patches with only dozens of images,which can help improve performance and alleviate the prob-lem of overﬁtting. However, HEs (as well as other lesions) aretypically relatively small in size, with their pixels only mak-ing up a small proportion of the whole image. This leads to S e c . s i on de t e c t i on / s eg m en t a t i on S e c . B i o m a r k e r s eg m en t a t i on S e c . D i s ea s e d i agno s i s S e c . O t he r app li c a t i on s · P a t ho l og i c a l m y op i a ( I ) · R e f r a c t i v e e rr o r ( II ) A pp li ca t i on s S e c . I m age sy n t he s i s · S y n t he s i s f o r g l au c o m a ( I ) · S y n t he s i s f o r v e ss e l s eg m en t a t i on ( II ) · S y n t he s i s f o r DR ( III ) · S y n t he s i s f o r A M D ( I V ) · S m a r t phone c a m e r a i m age sy n t he s i s ( V ) · M u l t i m oda l i m age r e c on s t r u c t i on ( V I ) · I m age s upe r r e s o l u t i on ( I S R ) ( V II ) . H E s . M A s . E x uda t e s . D r u s en 2 . M u l t i p l e - l e s i on s . V e ss e l . O D / O C . F o v ea ( XXV I ) . A / V c l a ss i ﬁc a t i on ( XXV II ) . DR . G l au c o m a 4 . A M D . D M E . R O P ( XXV ) . C a t a r a c t s ( XXV II ) · S e l e c t i v e l y s a m p li ng ( I ) · S eg m en t a t i on on c oa r s e l y - anno t a t ed da t a s e t s ( II ) . O ph t ha l m i c d i s ea s e s d i agno s i s . S ys t e m i c d i s ea s e s . I m age p r o c e ss i ng · . . . A pp r o ac h es b ase d on F CN ( XV I ) · U s i ng a t r ou s c on v o l u t i on s ( XV II ) · . . . A pp r o ac h es b ase d on U - N e t ( XV III ) · G u i dan c e f r o m dep t h e s t i m a t i on t a sk ( X I X ) · P i x e l - w i s e deep r eg r e ss i on ( XX ) · . . . A pp r o ac h es b ase d on R P N ( XX I ) · . . . D o m a i n a d a p t a t i on s t ud i es b ase d on G AN s ( XX II ) · . . . O t h e r a pp r o ac h es · A pp r oa c he s ba s ed on a v a r i a t i ona l au t oen c ode r ( XX III ) · O p t i c d i sc quan t i ﬁc a t i on ba s ed on m u l t i t a sk en s e m b l e l ea r n i ng f r a m e w o r k ( XX I V ) · V e ss e l and O D s eg m en t a t i on u s i ng a s i ng l e ne t ( XXV ) · CNN w i t h c i r c u l a r H ough c on v e r s i on ( V ) · M od i ﬁc a t i on t o l o ss f un c t i on ( V I ) · D eep r ando m w a l k ( V II ) · . . . A pp r o ac h es u s i ng a CNN b ack bon e ( V III ) · . . . A pp r o ac h es u s i ng a n F CN b ack bon e · M u l t i sc a l e ( I X ) · . . . A pp r o ac h es u s i ng U - N e t- li ke a r c h i t ec t u r e as t h e b ack bon e · D ua l - de c ode r s ( X ) · M u l t i sc a l e ne t w o r ks ( X I ) · . . . A pp r o ac h es u s i ng M ask - RCNN as t h e b ack bon e ( X II ) · . . . E a r l y ex p l o r a t i on s b ase d on CNN s ( I ) · S t r u c t u r ed p r ed i c t i on ( II ) · . . . F CN b ase d a pp r o ac h es ( III ) · . . . A pp r o ac h es u s i ng e n c od e r - d ec od e r a r c h i t ec t u r es · T r ea t t h i ck and t h i n v e ss e l s d i ff e r en t l y ( I V ) · C oa r s e - t o - ﬁ ne s eg m en t a t i on ( V ) · M u l t i sc a l e ne t w o r ks ( V I ) · I m p r o v e m en t s t o s a m p li ng ope r a t i on ( V II ) · D ua l - en c ode r ( V III ) · D a t a - a w a r e deep s upe r v i s i on ( I X ) · S pa t i a l a c t i v a t i on u s i ng a G au ss i an f un c t i on ( X ) · . . . O t h e r m e t hod s · I m age m a tt i ng m e t hod ( X I ) · I n c ep t i on c ap s u l e ne t w o r k ( X II ) · E n s e m b l e l ea r n i ng ( X III ) · R egu l a r i z a t i on unde r geo m e t r i c p r i o r s ( X I V ) · P e r f o r m i ng s eg m en t a t i on on R I M - O N E ( XV ) · C a r d i o v a sc u l a r r i sk f a c t o r s ( III ) · I sc he m i c s t r o k e s ( I V ) · A nno t a t i on - f r ee c a r d i a c v e ss e l s eg m en t a t i on ( V ) · B i o l og i c a l age e s t i m a t i on ( V I ) · I m age r eg i s t r a t i on ( V II ) · I m age enhan c e m en t ( V III ) · I m age qua li t y a ss e ss m en t ( I X ) · . . . C li n i ca l s t y l e p a p e r s ( I ) · E n s e m b l e s t r a t eg i e s ( II ) · . . . A pp r o ac h es c o m b i ng l es i on d e t ec t i on · G ene r a t i ng l e s i on hea t m ap s ( III ) · P e r f o r m i ng l e s i on s eg m en t a t i on a t t he s a m e t i m e ( I V ) · A tt en t i on m e t hod s ( V ) · B en c h m a r k w o r ks ( V I ) · . . . O t h e r a pp r o ac h es · B i - li nea r s t r a t eg y w i t h a tt en t i on m e c han i s m ( V II ) · H y b r i d m e t hod c o m b i ned w i t h m anua ll y de s i gned f ea t u r e s ( V III ) · S m a r t phone - ba s ed d i agno s i s ( I X ) · I DR i D c ha ll enge ( X ) · . . . C li n i ca l s t y l e p a p e r s ( X I ) · G l au c o m a t ou s O p t i c N eu r opa t h y ( GO N ) ( X II ) · . . . A pp r o ac h es c on s i d e r i ng O D / O C a r ea · O D / O C s eg m en t a t i on ( X III ) · D i r e c t CDR e s t i m a t i on ( X I V ) · . . . A pp r o ac h es bu il t on m u l t i - b r a n c h e d m e t hod s ( XV ) · . . . G e n e r a t i ng ev i d e n ce m a p s · W ea k l y - s upe r v i s ed m e t hod ( XV I ) · U s i ng da t a s e t c on t a i n i ng e v i den c e m ap l abe l ( XV II ) · M u l t i sc a l e ne t w o r ks ( XV III ) · . . . M e t hod s b ase d on a h y b r i d a r c h i t ec t u r e ( X I X ) · . . . A pp r o ac h es b ase d on CNN s ( XX ) · E n s e m b l e s t r a t eg i e s ( XX I ) · G u i dan c e f r o m l e s i on de t e c t i on ( XX II ) · T w o - s t age a r c h i t e c t u r e ( XX III ) · M u l t i sc a l e ne t w o r ks ( XX I V ) · S i m u l t aneou s DR and D M E d i agno s i s ( XXV

III ) · S i m u l t aneou s DR , g l au c o m a and A M D d i agno s i s ( XX I X ) · D i agno s i s o f e i gh t d i s ea s e s u s i ng pa i r ed C F P s ( XXX ) · D i agno s i s o f d i s ea s e s ( XXX I ) · R a r e pa t ho l og i e s de t e c t i on ba s ed on f e w - s ho t l ea r n i ng ( XXX II ) · T h r ee - s t age a r c h i t e c t u r e s ( XXV I ) . M u l t i p l e - d i s ea s e s · T w o - s t age m u l t i sc a l e ne t w o r ks ( III ) · C li n i c a l r epo r t gu i ded CNN s ( I V ) Fig. 6. The knowledge graph is summarized in this review. The lookup table for the references in the graph can be found in the

Appendix . Table 1. Widely used datasets of lesion detection / segmentation and DR diagnosis / grading Dataset name Number of images Resolution Camera AvailabilityDIARETDB0 130 (110 DR, 20 Normal) - digital fundus cameras with unknowncamera settings, FVO 50° available online DIARETDB1 89 (84 DR, 5 Normal) 1500 × Retinopathy OnlineChallenge 100 - a Topcon NW 100, a Topcon NW 200,or a CanonCR5-45NM, 2 di ﬀ erentlyshaped FOVs available on registra-tion RC-RGB-MA 250 2595 × RC-SLO-MA 58 1024 × IDRiD 516 4288 × Messidor 1200 1440 × × × Messidor-2 1748 1440 × × × e-ophtha EX 47 with 12,278 exudates,35 healthy ranging from1440 ×

960 to2544 × e-ophtha MA 148 with 1306 MA, 233healthy ranging from1440 ×

960 to2544 × DDR 13,673 mixed 42 types of fundus cameras with a45°FOV available online Kaggle / EyePACS 35,126 train, 53,576 test - multiple fundus cameras and di ﬀ erentﬁelds of views available on registra-tion CLEOPATRA 298 - multiple fundus cameras not available online https: // / project / imageret / https: // / project / imageret / http: // webeye.ophth.uiowa.edu / ROC / http: // / datasets http: // / datasets https: // ieee-dataport.org / open-access / indian-diabetic-retinopathy-image-dataset-idrid http: // / en / third-party / messidor / http: // / en / third-party / messidor2 / http: // / en / third-party / e-ophtha / http: // / en / third-party / e-ophtha / https: // github.com / nkicsl / DDR-dataset https: // / c / diabetic-retinopathy-detection / data Table 2. Summary of several results for lesion detection / segmentation on IDRiD dataset Reference Backbone Loss PR / % SE / % SP / % ACC / % AUPR / % AUC / % F1 / %Hemorrhage detection / segmentationGuo et al. (2019) FCN Top-k loss, Bin loss - - - - - -Yan et al. (2019a) U-Net weighted CE - - - - - -Microaneurysms detection / segmentationSarhan et al.(2019)(geometric) FCN Dice loss, CE and Triplet loss Guo et al. (2019) FCN Top-k loss, Bin loss - - - - - -Yan et al. (2019a) U-Net weighted CE - - - - - -Xue et al. (2019) Mask-RCNN log loss, regression loss, CE loss - - - -Hard exudate detection / segmentationGuo et al. (2020a) HED Top-k loss, Bin loss - - - - Guo et al. (2019) FCN Top-k loss, Bin loss - - - - - 79.45 -Yan et al. (2019a) U-Net weighted CE - - - - - -Xue et al. (2019) Mask-RCNN log loss, regression loss, CE loss - 77.9 - - -Soft exudate detection / segmentationGuo et al. (2019) FCN Top-k loss, Bin loss - - - - - -Yan et al. (2019a) U-Net weighted CE - - - - - - Table 3. Summary of several results for lesion detection / segmentation on E-ophtha dataset Reference Task Backbone Loss PR / % SE / % SP / % ACC / % AUPR / % AUC / % F1 / %Carson et al. (2018) MA classiﬁcation CNN - - - - -

86 94 -Guo et al. (2019) MA segmentation FCN Top-k loss, Bin loss - - - - - -Xue et al. (2019) MA segmentation Mask-RCNN log loss, regressionloss and CE loss - - - -Carson et al. (2018) Exudates classiﬁcation CNN - - - - -

64 95 -Guo et al. (2020a) EX detection HED Top-k loss, Bin loss - - - -

Guo et al. (2019) EX segmentation FCN Top-k loss, Bin loss - - - - - 41.71 -Xue et al. (2019) EX segmentation Mask-RCNN log loss, regressionloss and CE loss - 84.6 - - -Playout et al. (2019) Bright Lesion segmenta-tion U-Net loss based on Co-hen’s coe ﬃ cient - - Playout et al. (2019) Red Lesion segmentation U-Net loss based on Co-hen’s coe ﬃ cient - - Table 4. Summary of several results for lesion detection / segmentation on DiaretDB1 dataset Reference Task Backbone Loss PR / % SE / % SP / % ACC / % AUC / % F1 / %Dai et al. (2018) MA detection CNN - - -Adem (2018) Exudate detection CNN - - - - -Playout et al. (2018) Bright lesion segmentation U-Net loss based on Cohen’s coe ﬃ cient - 75.35 99.86 - - -Playout et al. (2019) Bright lesion segmentation U-Net loss based on Cohen’s coe ﬃ cient - Playout et al. (2018) Red lesion segmentation U-Net loss based on Cohen’s coe ﬃ cient - 66.91 99.82 - - -Playout et al. (2019) Red lesion segmentation U-Net loss based on Cohen’s coe ﬃ cient - Table 5. Summary of several results for lesion detection / segmentation on other datasets Reference Task Dataset Backbone Loss SE / % SP / % AUC / % mAP / %van Grinsven et al.(2016) HE detection Kaggle CNN CE -van Grinsven et al.(2016) HE detection Messidor CNN CE -Huang et al. (2020) HE segmentation private CNN MSE, IoU, GIoU - - - Yan et al. (2018a) Drusen segmenta-tion STARE,DRIVE Encoder-decoderNetwork - - -Adem (2018) Exudate detection DiaretDB0 CNN -

100 98.41 - -Adem (2018) Exudate detection DrimDB CNN -

100 98.44 - -Tan et al. (2017) EX detection CLEOPATRA CNN log-likelihood function - -Tan et al. (2017) HE detection CLEOPATRA CNN log-likelihood function - -Tan et al. (2017) MA detection CLEOPATRA CNN log-likelihood function - -Guo et al. (2019) EX segmentation DDR FCN Top-k loss, Bin loss - - -Guo et al. (2019) SE segmentation DDR FCN Top-k loss, Bin loss - - -Guo et al. (2019) HE segmentation DDR FCN Top-k loss, Bin loss - - -Guo et al. (2019) MA segmentation DDR FCN Top-k loss, Bin loss - - - an imbalance problem, where only a few patches contain le-sions and a large number do not contribute much to the le-sion detection / segmentation task. Imbalance is also commonin other lesion detection / segmentation tasks in this section,details of which will not be repeated for brevity. There aretwo main directions in the improvement of hemorrhage detec-tion / segmentation; namely selective sampling and performingsegmentation on coarsely-annotated datasets. Selective sampling. van Grinsven et al. (2016) proposeda method called selective sampling to reduce the use of re-dundant data and speed up CNN training. They invited threeexperts to relabel the Messidor dataset and a subset of Kaggle.During the training process, weights of samples were dynam-ically adjusted according to the current iteration’s classiﬁca-tion results, so that the informative samples were more likelyto be included in the next training iteration. Inspired by VGG,they designed a nine-layer CNN as the classiﬁer. On the Kag-gle competition and Messidor datasets, experimental resultsshowed that the CNN with selective sampling (SeS) outper-formed the CNN without selective sampling (NSeS), and SeSreduced the number of training epochs from 170 to 60.

Segmentation on coarsely-annotated datasets.

Huanget al. (2020) proposed a bounding box reﬁning network (BBR-Net) which can generate more accurate bounding box annota-tions for coarsely-annotated data. Then they utilized a Reti-naNet (Lin et al., 2017) to detect hemorrhage. Rather than us-ing the ﬁnely annotated IDRiD dataset, they performed hem-orrhage detection on a private dataset with coarsely annotatedbounding box. They ﬁrst established a dataset containing im-age pairs. For each pair, one image was taken from IDRiDand the other was obtained by simulating coarsely-annotatedbounding boxes. BBR-Net took coarsely annotated patches asinput and ﬁnely annotated patches as target. After training,the authors introduced their private data to obtain more accu-rate bounding box annotations, and then sent the results to theRetinaNet for hemorrhage detection.

Discussion.

The selective sampling method alleviates theproblem of data imbalance. Selective sampling is also usedin other applications, which will be introduced in the follow-ing sections. The explorations made by Huang et al. (2020)also o ﬀ er a promising direction. The generation of more accu-rate bounding box annotations can be seen as image synthesis,which will be discusses in more detail in Section 5.However, there are still some limitations in the current HEsdetection applications. First, the imbalance problem needs tobe further studied. Second, compared to other lesions, lessresearch has focused on HEs. More attention needs to be paidto this area for its importance in DR diagnosis. Third, pixel-level segmentation and detection are required. More datasetsthat provide pixel-level labels for HEs, like DDR, still need tobe explored. MAs are the earliest clinical sign of DR and have thus cap-tured more research interests. There are several barriers af-fecting the segmentation of MAs, including the existence ofother lesions with similar color, extremely low contrast, andvariation in image lighting, clarity and background texture.Two-stage multiscale architectures and guidance from clinicalreports are some successful strategies for MAs detection.

Two-stage multiscale networks.

Sarhan et al. (2019) pro-posed a two-stage deep learning approach embedding a tripletloss for microaneurysm segmentation. The ﬁrst stage is calledthe hypothesis generation network (HGN), in which multi-scale FCNs are employed to generate a region of interest(ROI). The second stage is known as the patch-wise reﬁne-ment network (PRN), in which patches extracted from aroundROIs are passed to a modiﬁed ResNet-50 for classiﬁcation.The authors introduced the triplet loss into the PRN to extractdiscriminative features. Further, the previously mentioned se-lective sampling method (van Grinsven et al., 2016) is utilized to reduce the computational cost and solve the data imbalanceproblem.

Clinical report guided CNNs.

Dai et al. (2018) proposed aclinical report guided multi-sieving convolutional neural net-work (MS-CNN) for the detection of MAs. They ﬁrst traineda weak image-to-text model from clinical reports and fundusimages to generate a rough segmentation of microaneurysms.Then the proposed MS-CNN was used to generate ﬁnal high-quality segmentation using the rough segmentation as guid-ance. In order to tackle the data imbalance problem, MS-CNNadopts a method similar to boosting. Speciﬁcally, MS-CNN iscomposed of multiple CNNs, where the false positives fromthe previous CNN are fed into the following CNN as negativeexamples.

Discussion.

Several e ﬀ ective methods have been employedin MA segmentation, including multiscale networks, guidancefrom clinical reports and utilization of the triplet loss. Theextraction of ROIs and cascaded architecture adopted in MS-CNN alleviate the imbalance problem. However, the two-stage architecture and cascaded architecture of MS-CNN lacke ﬃ ciency. They use multiple base networks, leading to a hugenumber of parameters to be trained. Thus, one promising di-rection in MA segmentation would be to reduce complexity ofthe networks while maintaining high performance. Soft and hard exudates are usually the basis for the diagno-sis of DR. Accurate detection of SEs and EXs are thus crucialfor timely treatment. Like other lesion detection / segmentationtasks, there are several challenges. The barriers include lowcontrast, varied sizes and similarity to other lesions. Thereare several approaches for exudate detection, most of whichcan be divided into CNN with circular Hough conversion andmodiﬁcations to the loss function. CNN with circular Hough conversion.

Adem (2018) in-troduced a three-layer CNN architecture for the binary clas-siﬁcation of exudated and exudate-free fundus images. Dur-ing pre-processing, the OD region was removed by applyingseveral methods, including adaptive histogram equalization,Canny edge detection and circular Hough conversion.

Modiﬁcation to loss function.

Guo et al. (2020a) proposeda top-k loss and a bin loss to enhance performance for exu-date segmentation. The class balanced cross entropy (CBCE)loss (Xie and Tu, 2015) solved the class imbalance problemto some extent. However, this introduced the new problem ofloss imbalance, where background similar to exudate tends tobe misclassiﬁed. The main reason is that with the di ﬀ erentweights for background and foreground pixels in CBCE loss,the loss for misclassifying a background pixel is much smallerthan that for misclassifying a hard exudate pixel. To solve thisloss imbalance problem, top-k loss is proposed, which consid-ers all hard exudate pixels but only top-k background pixelswith the larger loss. They also proposed a fast version of top-k loss named bin loss with consideration of e ﬃ ciency. Discussion.

In exudate detection, some works have fo-cused on modifying the loss function. The top-k loss andbin loss solved the loss imbalance problem caused by the use of CBCE. However, the misclassiﬁcation problem still re-mains. Moreover, only baseline models have been tested andno innovative architecture has been proposed. Adem (2018)’swork is based on a CNN. Several more recent models, suchas encoder-decoder networks, need to be utilized in exudatedetection.

Drusen, the main manifestations of the disease, can be usedto assist in the diagnosis of AMD. There are four main chal-lenges to drusen segmentation: their yellowish-white color issimilar to the fundus image and OD; uneven brightness andinterference from other biomarkers, such as blood vessels, iscommon; the drusen often have irregular shapes; and bound-aries may be blurred.

Deep random walk.

Yan et al. (2018a) proposed a deeprandom walk method to successfully segment drusen fromfundus images. The proposed architecture is composed ofthree main parts. Fundus images are ﬁrst passed into a deepfeature extraction module, which consists of two branches: aSegNet-like network capturing deep semantic features and athree-layer CNN capturing low-level features. Then the cap-tured features are fused together and passed into a named a ﬃ n-ity learning module to obtain pixel-pixel a ﬃ nities for formu-lating the transition matrix of the random walk. Finally, adeep random walk module is applied to propagate manual la-bels. This model achieved state-of-the-art performance on theSTARE and DRIVE datasets. Discussion.

As can be seen, only one e ﬀ ective approach hasbeen introduced so far. Other architectures and methods needto be explored. Further, the segmentation of drusen is closelyrelated to the diagnosis of AMD. Therefore, one of the futureworks can be extending the original drusen segmentation toserve as an evidence for AMD diagnosis. Most previous works only segment / detect one type of le-sion or treat all lesions as a single group (usually red le-sions or bright lesions). However, segmenting multiple le-sions simultaneously is of more practical value. More andmore researchers are thus focusing on multi-lesion segmen-tation / detection. The challenges found in the individual lesiondetection / segmentation tasks, including imbalance, contrast,illumination, etc, still exist. Further the inter-class similar-ity for di ﬀ erent lesions, such as HEs and MAs becomes moreprominent. All these factors make multi-lesion segmentationa challenging task. Tan et al. (2017) conducted the ﬁrst work to segment mul-tiple lesions, including exudates, haemorrhages and microa-neurysms, automatically and simultaneously using a 10-layerCNN, with the outputs evaluated at the pixel level. Their workdemonstrated that it is possible to segment several lesions si-multaneously using a single CNN architecture. Carson et al.(2018) used a CNN to perform ﬁve-class classiﬁcation on im-age patches. The ﬁve classes consist of 1) normal, 2) mi-croaneurysms, 3) dot-blot hemorrhages, 4) exudates or cotton wool spots, and 5) high-risk lesions such as neovasculariza-tion, venous beading, scarring, and so forth. They invited twoophthalmologists to verify and relabel a subset of the Kaggledataset containing 243 images. The image patching methodwas used, proving that good performance can be obtained us-ing such a method, even with limited training samples.

Multiscale networks are important models that have beenapplied to many ﬁelds. Guo et al. (2019) proposed a small ob-ject segmentation network (L-Seg) which can segment fourkinds of lesions, including microaneurysms, soft exudates,hard exudates and hemorrhages, simultaneously. The back-bone network is a VGG-16, which has ﬁve groups of convo-lution layers and three fully connected (FC) layers. They re-moved all the FC layers and the ﬁfth pooling layer and added aside extraction layer which consists of a 1 × There are two main directions explored in this section,namely dual-decoders and multiscale networks.

Dual-decoders.

Playout et al. (2018) proposed an exten-sion to U-Net which is capable of segmenting red and brightlesions simultaneously. They are the ﬁrst to use fully convolu-tional approaches for joint lesion segmentation. Several noveldevelopments were used in their decoder, including residualconnections, global convolutions and mixed-pooling. Theyused two identical decoders, each specialized for one lesioncategory. Near the end of training, they also added two fullyconnected conditional random ﬁelds (CRFs) (Kr¨ahenb¨uhl andKoltun, 2011). In their subsequent work, Playout et al. (2019),made several modiﬁcations. They proposed a novel unsuper-vised method to enhance segmentation performance by train-ing the network at image-level labels when pixel-level anno-tations are limited. They introduced an exchange layer whichaims to share parameters between two decoders softly, insteadof employing hard parameter sharing as previously (Playoutet al., 2018).

Multiscale networks.

Yan et al. (2019a) combined localand global features to segment microaneurysms, soft exudates,hard exudates and hemorrhages. A GlobalNet was used tocapture more context features, taking a downsampled versionof original images as input. They also employed a LocalNetwhich takes cropped image patches as input, aiming to capturemore detailed information. GlobalNet and LocalNet both usea U-Net-like encoder-decoder architecture as their backbone.

Xue et al. (2019) proposed a deep membrane system for si-multaneous MAs, EXs and OD segmentation. A hybrid struc-ture, consisting of a dynamic membrane system and commu-nication channels between cells, was designed. Three types of rules, i.e. T-rules, G-rules and D-rules were proposed for thecomputation and communication of the system, solving com-plex real applications in parallel. Mask-RCNN served as thecomputational cell of the membrane system.

In this subsection, we have seen that various base net-works have been applied to multi-lesion detection, includingrecent models like U-Net and Mask-RCNN. Multiscale meth-ods have also been explored, and have been proven quite suit-able for this task. Architectures modiﬁcations have also beenintroduced, with the dual-decoder being one notable example.Such a framework may work well on other similar scenarios.The proposed deep membrane system is quite innovative infundus image analysis and is expected to be further explored.However, there are still some limitations. First, compared toother segmentation and detection tasks like blood vessel seg-mentation and OD / OC segmentation, the performance, and inparticular sensitivity of lesion segmentation / detection needs tobe further improved. Second, there are still several works thatfocus on red or bright lesion segmentation instead of individ-ual lesions. However, the speciﬁc segmentation and detectionof individual lesions is more practical. Third, pixel-wise seg-mentation should be emphasized, and datasets with pixel-levellesion annotations deserve more attention.

3. Biomarker segmentation

Segmentation of retinal blood vessels is of paramount im-portance in the diagnosis of various ophthalmic diseases in-cluding diabetic retinopathy and glaucoma (Abr`amo ﬀ et al.,2010). With the use of powerful deep learning techniquessuch as CNNs, FCNs and recently U-Net, excellent perfor-mance has been achieved. However, there still remain somefactors making retinal blood vessel segmentation a challeng-ing task. These factors include varying contrast and inten-sity among di ﬀ erent datasets, inter-vessel di ﬀ erences betweenthick and thin vessels, the presence of optic disc and lesions,limited annotated data and so on. We will discuss how theseproblems were addressed in the following subsections.The most commonly used datasets in retinal blood ves-sel segmentation include DRIVE, STARE, CHASE DB1 andHRF. The DRIVE dataset (Staal et al., 2004) consists of 40images, seven of which show signs of mild early DR. DRIVEis o ﬃ cially divided into a training set and a test set, both con-taining 20 images. A single manual segmentation of vesselsis provided in the training set and two manual segmentationsare provided in the test set. Border masks are also availablefor all images. The STARE dataset (Hoover et al., 2000)consists of 400 images, 20 of which have two manual bloodvessel segmentations annotated by two experts. Ten of theimages contain pathologies. Coarsely annotated centerline-level artery / vein labels of 10 images are also provided. TheCHASE DB1 dataset (Owen et al., 2009) consists of 28 im-ages, obtained from both eyes of 14 multi-ethnic school chil-dren. The HRF dataset (Budai et al., 2013) consists of 45 im- ages, of which 15 are healthy, 15 have DR and 15 are glau-comatous. Compared to the other three datasets, images fromHRF have a higher resolution (3504 × ﬀ erent datasets areshown in Tab. 7, 8, 9 and 10. Before fully convolutional networks were widely used, ves-sel segmentation was regarded as a pixel-by-pixel classiﬁ-cation task and structured prediction was still a problem tobe solved. The usual approach was to crop the images intopatches as input, and uses CNNs whose last few layers arefully connected to predict the label of the center pixel of eachpatch. The approach proposed by Khalaf et al. (2016) is a typ-ical one. They used a CNN containing three conv layers toperform vessel segmentation. The last FC layer of the pro-posed CNN contains three neurons, representing the probabil-ity of central pixels being large vessels, small vessels or back-ground, respectively. Yu et al. (2020) also used a CNN whoselast layers are FC layers for vessel segmentation. And theyconducted further research based on the segmentation results.They ﬁrst extracted vascular trees from the segmented ves-sels using a graph-based method. Then two algorithms wereproposed for the hierarchical division of retinal vascular net-works.

Structured prediction has been explored by several re-searchers. Liskowski and Krawiec (2016) used a CNN forsegmentation. Their FC layer comprises two neurons rep-resenting the vessel and background. They also explored astructured prediction scheme, which can simultaneously pre-dict labels of all pixels in an s × s window of an n × n patch.Their approach was to set the number of the last FC layer’sneurons to s × s. Each neuron represents one pixel in the win-dow, and the output is a set of two-dimensional vectors insteadof scalars. Fully convolutional networks provide an end-to-end solu-tion, addressing the issue of structured prediction. Hence, theywere quickly applied to vessel segmentation. Fu et al. (2016)proposed a fully convolutional network called DeepVessel.They employed a side-output layer to help the network learnmultiscale features. At the end of the net, a CRF layer wasused to further model non-local pixel correlations. Dasguptaand Singh (2017) also proposed a fully convolutional network.Their network contains six conv layers, one downsamplinglayer and one upsampling layer. Feng et al. (2017) proposeda fully convolutional network which can be considered as asimpliﬁed version of U-Net. Their network only upsamplesand downsamples twice. In order to solve the class imbalancebetween background and blood vessels, they deﬁned an en-tropy which measures the proportion of vessel pixels in thepatch. During the training process, half of the patches are se-lected from the patches with the highest entropy, and the otherhalf are randomly selected. Oliveira et al. (2018) proposed anFCN architecture which is similar to that of Feng et al. (2017).In the pre-processing phase, they utilized a stationary wavelet transform (SWT) to obtain additional channels for the inputimages. Hu et al. (2018) proposed a multiscale network in-spired by RCF (Liu et al., 2017), which merges feature mapsof every middle layer with the output. Similar to Guo et al.(2019), they also removed all the FC layers of VGG-16 astheir backbone network. At the end of their net, fully con-nected CRFs are employed. An improved cross-entropy losswas also proposed to focus on hard examples.

Because of their excellent ability to extract features and ex-traordinary performance in practice, encoder-decoder archi-tectures, especially U-Net, are still the most popular segmen-tation frameworks applied to fundus images up to now. Thereare many directions for improvement in this area, as will bediscussed next.

Treating thick and thin vessels di ﬀ erently. In order toimprove performance on capillaries, one possible solution isto treat thick and thin vessels di ﬀ erently. Zhang and Chung(2018) proposed a multi-label architecture. They used open-ing and dilation operations to expand the original vessel andbackground into ﬁve classes, namely 0 (other background pix-els), 1 (background near thick vessels), 2 (background nearthin vessels), 3 (thick vessels) and 4 (thin vessels). The pro-posed architecture uses a U-Net with residual connection asthe backbone. A side-output layer was also introduced to cap-ture multiscale features. He et al. (2018) introduced an opera-tion named local de-regression (LODESS) to get additional la-bels. After the LODESS, the original binary labels (vessel andbackground) were further divided into ﬁve classes, speciﬁcally0 (the center of big vessels), 1 (the edge of big vessels), 2 (thecenter and edge of small vessels), 3 (the center of background)and 4 (edge of background). Yan et al. (2018b) introduced asegment-level loss which assigns di ﬀ erent weights to di ﬀ erentsegments according to their thickness. They ﬁrst obtained ves-sel segments from the whole vessel tree based on skeletoniza-tion and then estimated the relative thickness of each segment.Then a weight assigning strategy was designed to give thin-ner segments higher weights. Yan et al. (2019b) proposed athree-stage model for vessel segmentation. They ﬁrst applieda skeletonization method to extract the skeletons. For eachskeleton pixel, the diameter of the maximum inscribed circlethat is completely covered by vessel pixels is considered thethickness. A ThickSegmenter and a ThinSegmenter were uti-lized for thick and thin vessel segmentation respectively. Notethat, when calculating the loss, only thick vessel pixels werecounted for the ThickSegmenter and thin vessels for the Thin-Segmenter. Finally, the results of the two segmenters werepassed to a FusionSegmenter to get the ﬁnal result. Coarse-to-ﬁne segmentation.

This is another approachthat employs two branches: the ﬁrst takes fundus images as in-put to get a preliminary result and the second further reﬁnes it.Wu et al. (2018) proposed a multiscale network followed net-work (MS-NFN) to improve performance on capillaries. Inputimages are passed into two di ﬀ erent branches, namely the ‘up-pool’ NFN and ‘pool-up’ NFN. The two branches both haveidentical U-Net-like structures. The ﬁrst network converts in- Table 6. Widely used datasets for vessel segmentation

Dataset name Number of images Resolution Camera AvailabilityDRIVE 40 (33 healthy, 7 mild early DR) 768 ×

584 a Canon CR5 non-mydriatic3CCD camera, FOV 45° available on registra-tion STARE 400 (vessel segmentation labelingof 40 , A / V labeling of 10) 700 ×

605 a TopCon TRV-50 fundus cam-era, FOV35° available online CHASE DB1 28 1280 ×

960 - available online HRF 45, 15 each of healthy, DR andglaucomatous 3504 × https: // drive.grand-challenge.org / Download / http: // cecas.clemson.edu / ∼ ahoover / stare / https: // blogs.kingston.ac.uk / retinal / chasedb1 / http: // / research / data / fundus-images / Table 7. Summary of several results for vessel segmentation on DRIVE dataset

Reference Backbone Loss SE / % SP / % ACC / % AUC / % F1 / %Khalaf et al. (2016) CNN - 83.97 95.62 94.56 - -Liskowski and Krawiec(2016) CNN CE Ma et al. (2019) U-Net CE 79.16 98.11 95.70 98.10 -Zhao et al. (2020a) Dense U-Net global pixel loss, local mat-ting loss 83.29 97.67 - - 82.29Mishra et al. (2020) U-Net CE 89.16 96.01 95.40 97.24 -Feng et al. (2020) FCN MSE 76.25 98.09 95.28 96.78 -Cherukuri et al. (2020) Residual FCN MSE 84.25 -Kromm and Rohr (2020) CapsNet margin loss 76.51 98.18 95.47 97.50 -Liu et al. (2019a) No-reference net MSE 80.72 97.80 95.59 97.79 82.25 Table 8. Summary of several results for vessel segmentation on STARE dataset

Reference Backbone Loss SE / % SP / % ACC / % AUC / % F1 / %Liskowski and Krawiec(2016) CNN CE Hu et al. (2018) FCN improved CE 75.43 98.14 96.32 97.51 -Feng et al. (2020) FCN MSE 77.09 98.48 96.33 97 -Soomro et al. (2019) SegNet CBCE 84.8 98.6 96.8 98.8 -Cherukuri et al. (2020) Residual FCN MSE 86.64 98.95 98.03 99.35 -Zhao et al. (2020a) Dense U-Net global pixel loss, local mat-ting loss 84.33 98.57 - - 83.51Mishra et al. (2020) U-Net CE 87.71 96.34 95.71 97.42 -Liu et al. (2019a) No-reference net MSE 77.71 98.43 96.23 97.93 80.36

Table 9. Summary of several results for vessel segmentation on CHASE DB1 dataset

Reference Backbone Loss SE / % SP / % ACC / % AUC / % F1 / %Fu et al. (2016) FCN CBCE 71.30 - 94.89 - -Oliveira et al. (2018) FCN categorical CE 77.79 98.64 96.53 98.55 -Zhang and Chung (2018) U-Net CE 76.70 Table 10. Summary of several results for vessel segmentation on HRF dataset

Reference Backbone Loss SE / % SP / % ACC / % AUC / % F1 / %Soomro et al. (2019) SegNet CBCE -Zhao et al. (2020a) Dense U-Net global pixel loss, local matting loss 78.09 - - put patches into a probability map, and the second performsfurther reﬁnement. The di ﬀ erence between the two NFNsis that the ‘up-pool’ NFN upsamples before downsampling,and ‘pool-up’ is the opposite. Finally, probability maps ofthe two NFNs are averaged to generate the ﬁnal prediction.In their subsequent work (Wu et al., 2020), they added somemodiﬁcations to NFN to form a new network named NFN + .Compared to NFN, the main extensions include: introducinginter-network connections between the preceding and follow-ing networks; replacing the ‘up-pool’ and ‘pool-up’ networkswith an identical U-Net-like architecture; and removing theensemble operation. Wang et al. (2020) proposed a coarse-to-ﬁne supervision network (CTF-Net) for vessel segmentation.Their CTF-Net consists of two U-shaped networks, namelythe coarse segNet producing preliminary predicted map andthe ﬁne segNet further enhancing performance. They also pro-posed a feature augmentation module (FAM-residual block) toimprove the ability of the network to extract features. Mutliscale networks.

This is another important directionthat has been explored. Wu et al. (2019) proposed Vessel-Net,which is based on the multiscale method. They ﬁrst imple-mented an Inception-Residual (IR) block inspired by Incep-tion and ResNet that can be embedded into U-Net. Four su-pervision paths were introduced to the net, including: a tra-ditional supervision path; a richer feature supervision path,which resizes all stages of the encoder’s output to the samesize as the input patches (48 ×

48) and then concatenates them;and two multiscale supervision paths, where feature maps gen-erated by the encoder with size 12 ×

12 and 24 ×

24 are passedinto a 1 × Improvements to sampling operation.

The downsam-pling and upsampling operations will change the resolution ofthe feature maps, which is not ideal for the segmentation task.Several works thus tried to improve or replace these two oper-ations. Soomro et al. (2019) proposed a strided-CNN model toimprove the sensitivity. They ﬁrst performed pre-processingincluding morphological mappings and principal componentanalysis (PCA). Then the processed images were passed to aSegNet-like encoder-decoder architecture. The pooling oper-ation was replaced with a strided-conv inspired by Springen-berg et al. (2015). Zhang et al. (2019a) proposed the Atten-tion Guided Network (AG-Net) for vessel segmentation. Anattention guided ﬁlter inspired by He et al. (2013) was pro-posed. Speciﬁcally, it takes high-resolution feature maps fromthe encoder and low-resolution feature maps from the lowerstage of the decoder as input and produces high-resolution fea-ture maps as output. The attention guided ﬁlter can preserveedge and structural information. Note that AG-Net can alsoperform OD / OC segmentation.

Dual-encoder.

Wang et al. (2019a) proposed the Dual En-coding U-Net (DEU-Net). DEU-Net consists of two encoders.The ﬁrst, inspired by the global convolutional network (Penget al., 2017), has a spatial path with larger kernels to capturemore spatial information. The second, inpired by Inception,has a context path with multiple kernels to get more contextfeatures. A feature fusion module was proposed to fuse thefeatures extracted by the two encoders at the top stage. Chan-nel attention was used to replace the skip-connection in theoriginal U-Net.

Data-aware deep supervision.

Mishra et al. (2020) addeda data-aware deep supervision path to a U-Net-like network.Based on the concept of e ﬀ ective receptive ﬁeld (EFT) pro-posed by Luo et al. (2016), where the output-a ﬀ ecting regionis actually smaller than the theoretical receptive ﬁeld (RF),they proposed the concept of layer-wise e ﬀ ective receptiveﬁelds (LERFs), which are calculated by the gradient of the lossfunction using back-propagation. The average vessel widthwas taken as the target object size. The convolutional layerwith the smallest absolute di ﬀ erence between its LERF andvessel width was selected as the target layer, and consideredas the preeminent layer. Deep supervision was used in the tar-get layer. Spatial activation using a Gaussian function.

Ma et al.(2019) proposed a multitask network which can perform ves-sel segmentation and A / V classiﬁcation simultaneously. Inview of the observation that the value of capillary vessels andboundary vessels in a probability map is close to 0.5, they pro-posed a spatial activation module that assigns higher weightsto the thin vessels by a Gaussian function. Deep supervisionwas also utilized.

Several other methods have been proposed to improvemodel performance. They not only achieve good experimentalperformance, but are also inspiring.

Image matting method.

Zhao et al. (2020a) transformedthe segmentation problem to a related matting problem. Atrimap was ﬁrst obtained using a bi-level thresholding of thescore map. Then the retinal images and corresponding trimapswere sent to an end-to-end matting network to get the fore-ground matte. They proposed a local matting loss togetherwith a global pixel loss for training. The ﬁnal segmentationmap was obtained by applying a threshold to all pixels of thematte.

Inception capsule network.

Kromm and Rohr (2020)combined the Capsule network (Sabour et al., 2017) with theInception architecture for vessel segmentation and centerlineextraction. Their Inception Capsule network has a shallow ar-chitecture with fewer parameters and does not need data aug-mentation.

Ensemble learning.

Liu et al. (2019a) proposed a noveland simple unsupervised ensemble strategy for vessel segmen-tation. They multiplied the output results of the best perform-ing recent networks by the weights to obtain a result. Theweights of results were then trained and they ﬁnally obtainedbetter results than a single network. Regularization under geometric priors.

Cherukuri et al.(2020) proposed a domain enriched deep network for vesselsegmentation. A representation network was ﬁrst employed.Two geometrical regularizers, including an orientation diver-sity regularizer and a data adaptive noise regularizer, wereadded to the loss function to learn speciﬁc geometric fea-tures. After that they introduced a network containing resid-ual blocks with no downsampling / upsampling steps, insteadof using U-Net-like most other works. Performing segmentation on RIM-ONE.

Nasery et al.(2020) performed vessel segmentation on the RIM-ONEdataset. Compared to the DRIVE dataset, RIM-ONE is of alower quality and does not have any vessel annotations. In-stead of performing image synthesis to get high-quality im-ages, they transformed high-quality images with expert labelsfrom the DRIVE dataset to resemble poor-quality target im-ages. To accomplish this, substantial vignetting masks wereused. Then a U-Net was trained using the resulting imagesand their corresponding labels. Once trained, the net couldbe used to obtain vessel masks of images from the RIM-ONEdataset.

From the model discussed in this section, we can see thedevelopment of base networks used, for vessel segmentation,from CNNs to FCNs to U-Net-like architectures. The use ofCNNs and FCNs has become less common recently, while U-Net-like architectures are very popular. However, the featureextraction ability of U-Net is inadequate. U-Net only has 10convolution operations in the encoder, which is even less thanthat in VGG-16. Therefore, several works have focused onhow to improve the feature extraction ability. Alternatives in-clude Dense U-Net, Residual U-Net, and a dual-encoder net-work. Another disadvantage of U-Net is that there are fourpaired sampling operations (downsampling and upsampling),which is not ideal for the segmentation task. Several studieshave tried to alleviate this problem by, for instance, using ashallower version of U-Net that has two or three paired sam-pling operations. Multiscale methods can also be utilized toimprove the performance of the segmentation task. Low-levelspatial features and high-level semantic features can both befocused on by the network. In order to solve the problemof poor performance on thin and edge vessels, method fortreating thick and thin vessels di ﬀ erently have been explored.It is worth noting that there are several other inspiring andalso interesting studies, including data-aware deep supervi-sion, spatial activation, image matting, using a Inception cap-sule network, ensemble learning and performing segmentationon RIM-ONE.However, there are still several limitations. First, there isstill room for improvement in thin and edge vessels segmen-tation. Speciﬁcally, sensitivity and accuracy need to be fur-ther improved while maintaining speciﬁcity and AUC. Sec-ond, there are only three commonly used datasets for ves-sel segmentation, namely DRIVE, STARE and CHASE DB1.And they contain fewer images than datasets for other tasks.On the one hand, more experiments need to be carried on the high-resolution HRF dataset. On the other hand, more imagesshould be collected and annotated. Researchers can also em-ploy image synthesis methods. Third, the imbalance problemalso exists in the vessel segmentation task and it is even morechallenging to solve than in other tasks like lesion and OD / OCsegmentation for the irregular shape of blood vessels. The typ-ical approach is to use a class-balancing loss. It is worth notingthat a selective sampling method based on entropy could be ef-fective. More attention should thus be paid to the imbalanceproblem. / OC Segmentation

Cup-to-Disc ratio (CDR) is a widely accepted and usedstandard for the diagnosis of glaucoma. It is calculated as theratio of vertical cup diameter (VCD) and vertical disc diame-ter (VDD) (Phene et al., 2019). The segmentation of the op-tic cup (OC) and optic disc (OD) is therefore very importantfor the diagnosis of glaucoma. Compared to OD segmenta-tion, OC segmentation is a more challenging task for its subtleboundaries. Further, there is an imbalance problem for OC, asthe OC region only accounts for a low proportion of extractedROIs.The datasets used in this ﬁeld are shown in Tab. 11. Similarto lesion segmentation / detection, there is also an overlap be-tween datasets used in OD / OC segmentation and glaucomadiagnosis. The ONHSD dataset (Lowell et al., 2004) con-sists of 99 images obtained from 50 patients of various eth-nic backgrounds. Further, 96 images have discernable ONH.The Drions-DB dataset (Carmona et al., 2008) consists of 110images, belonging to 55 patients with glaucoma (23.1%) andeye hypertension (76.9%). The images were obtained from ahospital in Spain. The ORIGA dataset (Zhang et al., 2010)consists of 650 images, of which 168 are glaucomatous and482 are normal. The boundaries of OD and OC, CDR valueand a label indicating whether glaucoma exists or not are pro-vided for each image. The RIM-ONE-r3 dataset (Fumeroet al., 2011) consists of 169 ONH images, of which 118 arenormal, 11 have ocular hypertension (OHT) and 40 are glau-comatous. Five-class labels were provided by ﬁve experts.The ACHIKO-K dataset (Zhang et al., 2013) consists of 258images, which were obtained from 67 glaucomatous patientsfrom Korea. 144 images are of glaucomatous eyes and 114are normal. The Drishti-GS dataset (Sivaswamy et al., 2014)contains 101 images, which are o ﬃ cially divided into 50 train-ing images and 51 test images. Images were obtained from ahospital in India. The SCES dataset (Baskaran et al., 2015)consists of 1,676 images, each from a single subject, whichonly provide clinical diagnoses. 46 images of SCES are glau-comatous. The RIGA dataset (Almazroa et al., 2018) is madeup of three parts, namely 460 images from MESSIDOR, 195images from the Bin Rushed Ophthalmic Center and 95 im-ages from the Magrabi Eye Center, making the total numberof images 750. Each image was manually annotated by sixophthalmologists. The LAG dataset (Li et al., 2020b) con-tains 11,760 images, of which 6,882 do not have glaucomaand 4,878 are suspicious. 5,824 images were further anno-tated with attention labels, in which 2,392 display glaucomaand the remaining 3,432 do not. Experimental results are shown in Tab. 12, 13, 14, 15 and16. “A” in the tables means balanced accuracy,“E” is thewidely used overlapping error, and δ denotes absolute CDRerror. Similar to blood vessel segmentation, fully convolutionalnetworks were widely used in early OD / OC segmentation.Edupuganti et al. (2018) used FCN-8s to perform OD / OC seg-mentation. They also explored various strategies, such as as-signing higher weights to edges in the loss function, for furtherimprovement.

Using atrous convolutions is quite common in this ﬁeld.Mohan et al. (2018) proposed a structure named Fine-Netwhich has a symmetrical encoder-decoder architecture. In-spired by full-resolution residual networks (FRRNs) (Pohlenet al., 2017), they used several full-resolution residual units(FRRU) in Fine-Net. An atrous convolution was also intro-duced to alleviate the memory cost while ensuring reliable per-formance at the same time. In their subsequent work, Mohanet al. (2019) proposed P-Net. P-Net is used to obtain prelimi-nary segmentation results by taking downscaled images as in-put. The output of P-Net is upscaled and then sent to Fine-Netas guidance for further segmentation. DenseBlock and atrousconvolution are combined as the dense atrous block (DB) toformulate P-Net. DBs with di ﬀ erent dilation rates are usedto capture multiscale features, inspired by atrous spatial pyra-mid pooling (ASPP) (Chen et al., 2017). Liu et al. (2019e)proposed a spatial-aware neural network which adopts a mul-tiscale method. First, an atrous CNN model is used to extractspatially denser feature maps. Then, the extracted features arefed to a pyramid ﬁltering module to obtain multiscale features.Finally, the multiscale features are passed to a spatial-awaresegmentation network to get the ﬁnal result. There are also several works using U-Net as the baseline.Fu et al. (2018a) proposed M-Net for OD / OC segmentation.M-Net contains a multiscale input layer. Images are down-sampled to form an image pyramid as U-Net’s input, to obtaina multi-level receptive ﬁeld. In order to segment OD and OCsimultaneously, the authors proposed a multi-label loss basedon the dice loss, which is intended to solve the problem ofmulti-label and data imbalance. Moreover, a polar transfor-mation was introduced to obtain spatial consistency and in-crease the proportion of the OD / OC in a patch. Their approachgreatly inspired subsequent work.Shah et al. (2019) proposed two di ﬀ erent methods namedthe Parameter-Shared Branched Network (PSBN) and WeakRegion of Interest Model-based segmentation (WRoIM). Withfewer parameters, they obtained comparable performance tostate-of-the-art approaches. PSBN has two branches, whichare used to generate masks for the OD and OC, respectively.Encoders of the two branches share parameters, and the OCbranch uses cropped activations from the OD branch. WRoIMﬁrst obtains a coarse OD area through a small U-Net structure(one conv block for downsampling and one conv block for upsampling), and uses the extracted ROI as another U-Net’sinput to perform ﬁne segmentation. Guidance from depth estimation task can boost perfor-mance of OD / OC segmentation. Shankaranarayana et al.(2019) proposed a fully convolutional network for retinaldepth estimation and used the results to guide OD / OC seg-mentation. They proposed a Dilated Residual Inception (DRI)module utilizing convolution kernels with di ﬀ erent dilationrates in the way of an Inception Block to extract multiscalefeatures. In order to ensure that the retinal depth estimationbranch guides OD / OC segmentation, they proposed a multi-modal feature fusion module to fuse feature maps from thedepth estimation branch and the OD / OC segmentation branch.

Pixel-wise deep regression is a novel direction under ex-ploration. Meyer et al. (2018) reformulated the segmentationtask as a pixel-wise regression task to perform OD and foveadetection simultaneously. A bi-distance map is ﬁrst obtained,illustrating the distance between every pixel and its nearestlandmarker, namely OD and fovea. Then a U-Net-like deepnetwork is utilized for distance regression and obtaining aglobally consistent prediction map.

The RPN method from Faster-RCNN (Ren et al., 2015)and Mask R-CNN (He et al., 2017) has also been applied inthe segmentation of OD / OC. Inspired by RPN, Wang et al.(2019f) proposed the Ellipse Proposal Network (EPN) to de-tect ellipse regions. They used { X , Y , F , F } as parametersof elliptical anchors, where ( X , Y ) denotes the center coor-dinate of the ellipse and F , F denote the major and minoraxis, respectively. Two EPN branches were used to detectOD and OC. In addition, a spatial attention path was intro-duced between the OD to OC branches to guide the detectionof OC. Yin et al. (2019) proposed a PM-Net, which is alsoinspired by RPN. They introduced a segmentation branch toRPN to provide more accurate proposals for localization of theoptic nerve head (ONH) area. A pyramid RoIAlign modulewas also proposed to capture multiscale features. Jiang et al.(2020) proposed JointRCNN for OD and OC segmentation.An atrous-VGG16 is ﬁrst used for feature extraction. The ex-tracted features are passed to two parallel branches, i.e., a discproposal network (DPN) and a cup proposal network (CPN),for OD and OC region proposal. A disc attention module isemployed between the DPN and CPN to determine where theOC is located based on the result of the DPN. Domain adaptation is another direction that has been ex-plored. Wang et al. (2019c) proposed the patch-based Out-put Space Adversarial Learning (pOSAL) framework. Im-ages from the source and target domains are ﬁrst passed into alightweight network to extract ROIs. The extraction networkis based on Deeplabv3 + (Chen et al., 2018) but utilizes Mo-bileNetV2 (Sandler et al., 2018) as the backbone. The authorsalso designed a morphology-aware segmentation loss for thenetwork. Then the ROIs extracted are passed to a patch dis-criminator (Patch GAN) for adversarial learning, which can Table 11. Widely used datasets for OD / OC segmentation and glaucoma diagnosis / grading Dataset name Number of images Resolution Camera AvailabilityONHSD 100 640 ×

480 a Canon CR6 45MNf fundus cam-era, FOV 45° available online Drishti-GS 101 2896 × Drions-DB 110 600 ×

400 a colour analogical fundus camera available online ORIGA 650 (168 glaucomatous,482 normal) 3072 × × × RIM-ONE 169 ONH - a fundus camera Nidek AFC-210with a body of a Canon EOS 5DMark II of 21.1 megapixels not available onlineACHIKO-K 258 (144 glaucomatic) 640 × × × SEED 235 (43 glaucoma) - - not available onlineREFUGE 1200 2124 × × SCES 1676 3072 × × ×

597 to3456 × http: // / Image%20Datasets / ONHSD.aspx http: // cvit.iiit.ac.in / projects / mip / drishti-gs / mip-dataset2 / Home.php https: // / publication / https: // deepblue.lib.umich.edu / data / concern / data sets / / https: // oar.a-star.%20edu.sg / jspui / handle / / = full https: // refuge.grand-challenge.org / https: // github.com / smilell / AG-CNN

Table 12. Summary of several results for OD / OC segmentation on Drishiti-GS dataset

Reference Backbone Loss OD OC δ Dice / % IoU / % Dice / % IoU / %Edupuganti et al. (2018) FCN weighted CE - 69.58 - 81.22 -Mohan et al. (2018) FCN bootstrapped CE and Dice loss 96.4 - - - -Mohan et al. (2019) FCN bootstrapped CE and Dice loss 97.13 - - - -Liu et al. (2019e) FCN spatial-aware error function - 89 - -Shankaranarayana et al.(2019) Encoder-decoder net multi-class CE 96.3 - 84.8 - 0.1045Shah et al. (2019)(PSBN) U-Net logarithmic dice loss 95 91 88 -Shah et al. (2019)(WRoIM) U-Net logarithmic dice loss 96 -Wang et al. (2019c) Deeplab, GAN dice coe ﬃ cient loss, smooth-ness loss and adversarial loss 97.4 - - Wang et al. (2019b) DeeplabV3 + , GAN CE, MSE, Adversarial loss 96.1 - 86.2 - - Table 13. Summary of several results for OD / OC segmentation on ORIGA dataset

Reference Backbone loss OD OC Rim δ A / % E A / % E A / % ELiu et al. (2019e) FCN spatial-aware error function - 0.059 - - -Fu et al. (2018a) U-Net proposed multi-label loss 98.3 0.071 93.0 0.230 94.1 0.233 0.071Shankaranarayana et al.(2019) Encoder-decoder net multi-class CE 97.4 Jiang et al. (2020) atrous CNN and RPN Smooth L loss and BCE - 0.063 - 0.209 - - 0.068 Table 14. Summary of several results for OD / OC segmentation on RIM-ONE-r3 dataset

Reference Backbone loss OD OC δ A / % E Dice / % IoU / % A / % E Dice / % IoU / %Shankaranarayanaet al. (2019) Encoder-decoder net multi-class CE - - 0.066Shah et al.(2019)(PSBN) U-Net logarithmic dice loss - - 91 84 - - 75 60 -Shah et al.(2019)(WRoIM) U-Net logarithmic dice loss - - 94 - - 82 -Wang et al.(2019c) Deeplab, GAN dice coe ﬃ cient loss, smooth-ness loss, adversarial loss - - 96.8 - - - 85.6 - Wang et al.(2019b) DeeplabV3 + , GAN CE, MSE, Adversarial loss - - 89.8 - - - 81.0 - - Table 15. Summary of several results for OD / OC segmentation on REFUGE dataset

Reference Backbone Loss OD OC Rim δ A / % E Dice / % A / % E Dice / % A / % EWang et al. (2019f) RPN Weighted CE, regression loss - - 95.3 - - 87.2 - - 0.047Yin et al. (2019) RPN Multi-label CE - - ﬃ cient loss, smooth-ness loss and adversarial loss - - - - - - Liu et al. (2019d) GAN dice segmentation loss, ad-versarial loss and MSE loss - - 94.16 - - 86.27 - - 0.0481

Table 16. Summary of several results for OD / OC segmentation on other datasets

Reference Dataset Backbone Loss OD OC δ E Dice / % E Dice / %Mohan et al. (2018) DrionsDB FCN bootstrapped CE, Dice loss - 95.5 - - -Mohan et al. (2019) DrionsDB FCN bootstrapped CE, Dice loss - - - -Mohan et al. (2018) MESSIDOR FCN bootstrapped CE, Dice loss - 95.7 - - -Mohan et al. (2019) MESSIDOR FCN bootstrapped CE, Dice loss - - - -Jiang et al. (2020) SCES atrous CNN, RPN Smooth L loss, BCE - - Sedai et al. (2017a) EyePACS VAE negative KL-divergence, BCE - - - - learn abstract spatial and shape features from the label distri-bution of the source domain. In their following work, Wanget al. (2019b) proposed an unsupervised domain adaptationnetwork named Boundary and Entropy-Driven AdversarialLearning (BEAL). Based on the observation that predictionsin the target domain made by the network trained on the sourcedomain tend to contain ambiguous and inaccurate boundariesand the corresponding entropy map is noisy with high-entropyoutputs, they introduced two segmentation branches focusedon the boundary and entropy map, respectively. An adver-sarial learning method was introduced to encourage predic-tions of the two branches to be domain-invariant. Liu et al.(2019d) proposed an unsupervised domain adaptation archi-tecture named Collaborative Feature Ensembling Adaptation(CFEA). Their framework consists of three parts; namely, thesource domain network (SN), which learns from the sourcedomain with labels, the target domain student network (TSN)and the target domain teacher network (TTN), which learnfrom target domains without labels. Adversarial learning wasintroduced between SN and TSN, where the supervised SNenables the segmentation network obtain more precise predic-tion, and the unsupervised TSN introduces a perturbation tothe training of the network. The MSE between TSN and TTNwas calculated to help the student network’s learning. There are serveral other approaches for addressing vesseland OD segmentation using a variational autoencoder (VAE)as the backbone, tackling a novel task named optic disc quan-tiﬁcation and using a single net.

Approaches based on a variational autoencoder.

Notethat some works were built on VAE. Sedai et al. (2017a) pro-posed a semi-supervised method to perform OC segmentationusing limited labeled training samples. First, a generative vari-ational autoencoder (GVAE) is trained using a large amout ofunlabeled data to learn the feature embedding. Then a seg-mentation variational autoencoder (SVAE) is used to predictthe OC mask by leveraging the feature embedding providedby the GVAE.

OD quantiﬁcation based on multitask ensemble learningframework.

Unlike traditional OD / OC segmentation, opticdisk quantiﬁcation refers to the simultaneous quantiﬁcation ofa series of medical indicators, namely two vertical diameters(OD, OC), two complete regions (Disc, Rim), and 16 localregions (Garway-Heath and Hitchings, 1998). Accurate op-tic disc quantiﬁcation provides e ﬀ ective help in the diagnosisand treatment of many eye diseases such as chronic glaucoma(Maninis et al., 2016). Zhao et al. (2019c) proposed a multi-task ensemble learning framework (DMTFs) to perform opticdisc quantiﬁcation. To the best of our knowledge, they werethe ﬁrst to use deep learning to accomplish this task. In theirfollowing work (Zhao and Li, 2020), they made several mod-iﬁcations to their original model, including incorporating aconduct feature interaction module for highly correlated tasks. Vessel and OD segmentation using a single net.

Mani-nis et al. (2016) proposed Deep Retinal Image Understanding(DRIU) for vessel and OD segmentation. They used VGG- 16 as “base network” and removed its FC layers. The featuremaps of di ﬀ erent levels from the base network are resized andfused to form two task-speciﬁc “specialized” layers, which areused to perform vessel segmentation and OD segmentation si-multaneously. Compared to other segmentation tasks like vessel or lesionsegmentation, the segmentation of the OD / OC is more similarto natural image segmentation due to its ellipse shape. There-fore, several architectures taken from natural image segmen-tation are used in this task, including Deeplabv3 + and Mask-RCNN. Such networks are rarely seen in vessel segmentationor lesion segmentation. Further, compared to the other twosegmentation tasks, the research on OD / OC segmentation isthe most complete. Architectures of FCN, U-Net, Deeplabv3 + and Mask-RCNN have been used, and methods like multiscaleand polar transformations have been tried.Future work on OD / OC segmentation may lie in the fol-lowing directions. First, the more accurate task of OD quan-tiﬁcation may be promising. Second, the problems of domainshift and poor generation performance still exist. Therefore,domain adaption should be paid more attention to. It is alsoworth noting that the REFUGE2 challenge is ongoing. Re-searchers can follow the competition to see the latest methodsand research directions.

The fovea is one of the most signiﬁcant landmarks in fun-dus images. Segmenting the fovea can help deﬁne the risk ofa particular lesion in retinal diagnosis. However, due to thelack of publicly-available datasets, few works focus on foveasegmentation.Sedai et al. (2017b) proposed a two-stage framework forfovea segmentation, using a subset of EyePACS as trainingdata. The ﬁrst stage is a coarse network, which performscoarse segmentation to localize the fovea region. The authorsdiscarded the FC layers of VGG-16 to make it a fully convolu-tional network. Feature maps at di ﬀ erent levels are upsampledto the same size as the input images and fused together to getthe output. The second stage is a ﬁne network, which takesthe ROI regions obtained by the coarse network to generatethe ﬁnal result. The only di ﬀ erence between the ﬁne networkand coarse network is that the ﬁne network only uses the lasttwo blocks of VGG-16 to get the segmentation output.It is clear that only one remarkable work has been intro-duced for fovea segmentation. More researches are thus calledfor to obtain better performance on fovea segmentation. More-over, the framework used by Sedai et al. (2017b) is a two-stagearchitecture. More architectures need to be explored. / V classiﬁcation

Subdividing the blood vessels in fundus images into arter-ies and veins is of vital importance for the early diagnosis ofmany diseases. For example, a low ratio between arteriolarand venular width (AVR) can predict diabetes and many car-diovascular diseases (Niemeijer et al., 2011). The widely used datasets for this task are as follows. Compared to the DRIVEdataset used in vessel segmentation, the DRIVE-AV dataset(Hu et al., 2013) further provides pixel-level artery / vein la-bels. As mentioned before, it has 20 images in the training setand 20 images in the test set. DRIVE-AV is also called RITEin some papers. The LES-AV dataset (Orlando et al., 2018)consists of 22 images with pixel-level labels. The INSIPRE-AVR dataset (Dashtbozorg et al., 2014) consists of 40 imageswith only centerline-level annotation. The private IOSTARdataset (Abbasi-Sureshjani et al., 2015) consists of 24 imagesannotated by two experts.Galdran et al. (2019) regarded the A / V classiﬁcation task asa four-class segmentation problem, with categories includingbackground, artery, vein and uncertain. They used a U-Net-like structure to classify the arteries and veins directly withoutsegmenting the vessel tree ﬁrst. To the best of our knowledge,they are the ﬁrst to focus on pixel-level uncertainty in the taskof vascular segmentation and classiﬁcation. Raj et al. (2020)proposed an Artery-Vein Net (AV-Net) for A / V classiﬁcation.The backbone network is ResNet-50 and squeeze-excitation(SE) blocks are used. Feature maps of di ﬀ erent scales are up-sampled and fused to the same size as the input image to getthe segmentation map. AV-Net does need a segmented vas-culature map as input, instead only requiring a single wave-length, color fundus image. Finally, as introduced previously,Ma et al. (2019) performed A / V classiﬁcation while segment-ing blood vessels.

Discussion.

From the above methods we can see that A / Vclassiﬁcation is a promising direction. The general tendency isto directly perform A / V classiﬁcation without performing ves-sel segmentation ﬁrst. However, A / V classiﬁcation is an evenmore challenging task than vessel segmentation. Further inexisting works, there is still the problem of arteries and veinsappearing in a single vessel segment, which is not common inreality.

4. Disease diagnosis / grading Diabetic retinopathy (DR) is a vascular disease that a ﬀ ectsnormal blood vessels in the eye and is the leading cause of pre-ventable blindness worldwide (Wilkinson et al., 2003). Thereis a uniﬁed standard for DR classiﬁcation, namely the Interna-tional Clinical Diabetic Retinopathy Scale (ICDRS). Accord-ing to this standard, the severity of DR can be graded into ﬁveclasses, namely 0 (no apparent DR), 1 (mild DR), 2 (moderateDR), 3 (severe DR), 4 (poliferative DR). The most commonlyused datasets are shown in Tab. 1. All of them have been in-troduced in Section 2. The experimental results are shown inTab. 17. There are several clinical style papers, usually found in clin-ical journals such as JAMA and Diabetic Care. These papers https: // ignaciorlando.github.io / http: // / datasets usually pay more attention to actual clinical meaning ratherthan network architecture improvements. Most of the trainingdatasets were collected by the authors, rather than using publicdatasets.David et al. (2016) proposed a system that can automat-ically detect DR, called IDx-DR X2.1. It applies a set ofCNN-based detectors for each image in the detection. TheirCNN structure is inspired by AlexNet and VGG and is ableto predict four labels, namely negative (no or mild DR), refer-able DR (rDR), vision-threatening DR (vtDR), and low examquality (protocol errors or low-quality images). CNN-basedanatomy detectors can further detect hemorrhages, exudates,and other lesions. Gargeya and Leng (2017) also used a CNNto perform DR binary classiﬁcation. They used a CNN con-taining ﬁve residual blocks to extract image features. The fea-tures extracted by the deep CNN and metadata informationwere fed into a decision tree model for binary classiﬁcation.Li et al. (2018b) used a deep learning algorithm (DLA) for thedetection of referable DR. For the training and validation set,they collected 71,043 images from a website named LabelMeand invited 27 ophthalmologists to annotate them. They usedfour Inception-v3 networks for di ﬀ erent tasks, namely 1) clas-siﬁcation of vision-threatening referable DR, 2) classiﬁcationof DME, 3) evaluation of image quality for DR, and 4) assess-ment of image quality and of the availability of the macularregion of DME. Ensemble strategies are commonly used in this area. Gul-shan et al. (2016) used a CNN for binary classiﬁcation ofwith / without DR. They used a dataset of 128,175 images,which were annotated three to seven times by 54 experts. Thespeciﬁc network uses the structure of Inception-v3, and anensemble of ten networks trained with the same data. Theﬁnal result is the average of all network outputs. Krauseet al. (2017) used a CNN for the ﬁve-class classiﬁcation ofDR. Their improvements over Gulshan et al. (2016) include:using Inception-v4 instead of Inception-v3, using a largerdataset during training, and using higher-resolution input im-ages. Their network structure is also an ensemble of ten net-works. Zhang et al. (2019b) established a high-quality labeleddataset, and adopted an ensemble strategy to perform two-class and four-class classiﬁcations. Features extracted fromdi ﬀ erent CNN models are passed through the correspondingSDNN modules, which are deﬁned as component classiﬁers.Then, the features are fused and fed into a FC layer to generatethe ﬁnal results. Considering the internal correlation between the diagnosisof DR and detection of hemorrhages, exudates and other le-sions, many works also generate heatmaps of lesions whileperforming DR diagnostic grading. These methods consistof: generating lesion heatmaps, lesion segmentation, attentionmethod and two benchmark works.

Generating lesion heatmaps.

Yang et al. (2017) proposeda two-stage DCNN that can simultaneously delineate the le-sions and perform DR severity grading. The ﬁrst stage is alocal network, which extracts local features for lesion detec- Table 17. Summary of several results for DR diagnosis / grading Reference Dataset Category Backbone Loss SE / % SP / % AUC / % Kappa / %David et al. (2016) Messidor-2 4 CNN - -Gulshan et al. (2016) Messidor-2 2 Inception-v3 - 87.0 -Gargeya and Leng(2017) Messidor-2 2 CNN 2-class categorical CE

87 94 -Wang et al. (2017) Messidor 5 CNN - - - 95.7 -Lin et al. (2018) Messidor 5 CNN - - - -Gulshan et al. (2016) EyePACS 2 Inception-v3 - 90.3 -Gargeya and Leng(2017) EyePACS 2 CNN 2-class categorical CE

98 97 -Gargeya and Leng(2017) E-Ophtha 2 CNN 2-class categorical CE

90 94 95 -Quellec et al. (2017) E-Ophtha 2 CNN - - - 94.9 -Wang et al. (2017) Kaggle 5 CNN - - - 85.4 -Lin et al. (2018) Kaggle 5 CNN - - - - 85.9Roy et al. (2017) Kaggle 5 CNN - - - - Yang et al. (2017) Kaggle 4 CNN - - - -Quellec et al. (2017) Kaggle 2 CNN - - - -Gondal et al. (2017) DiaretDB1 2 CNN - -Foo et al. (2020) SiDRP14-15 5(No DR here) U-Net, VGG16 binary CE - - -Foo et al. (2020) IDRiD 5(No DR here) U-Net, VGG16 binary CE - - -Lin et al. (2018) private 5 CNN - - - -

Krause et al. (2017) private 5 (moderate orworse DR here) Inception-v4 -

Li et al. (2018b) private 2 Inception-v3 - -Zhang et al. (2019b) private 2 CNN CE -Zhang et al. (2019b) private 4 CNN CE - -Gulshan et al. (2019) hospital inSankara 2 CNN - -Gulshan et al. (2019) hospitals inAravind 2 CNN - - tion. The second stage is a global network for the grading ofDR. A weighted lesion map is obtained from the local net-work and the original fundus images. An imbalanced weight-ing scheme was introduced to pay more attention to lesionpatches while performing DR grading. Gondal et al. (2017)adopted unsupervised learning to perform DR grading andgenerate lesion heatmaps using only image-level labels. Theirmain network uses the o O solution (Mathis Antony, 2015),replacing the last dense layer with a global average pooling(GAP) layer. Their way of generating heatmaps was mainlyinspired by Zhou et al. (2016). Quellec et al. (2017) proposeda solution to generate a heatmap that shows what roles thepixels in an image play in image-level prediction. They candetect both image-level referable DR and pixel-level biomark-ers. Their network’s baseline is also the o O solution, and amethod called backward-forward propagation was proposedto optimize the parameters. Performing lesion segmentation at the same time.

Fooet al. (2020) used an encoder-decoder network for DR grad-ing and lesion segmentation. They replaced the encoder ofU-Net with VGG-16, which has ﬁve groups of conv layers.Correspondingly, the decoder is modiﬁed as a mirror of theencoder. This architecture can perform lesion segmentationnaturally. Then, for DR grading, they attached a GAP layer tothe saddle layer of the network for classiﬁcation. They furtherproposed a semi-supervised approach to increase the numberof training images.

Attention methods.

Attention mechanisms are also com-monly used in DR diagnosis and grading. Wang et al. (2017)proposed a Zoom-in-Net that can simultaneously perform ﬁve-class DR grading and generate attention maps highlighting le-sions. Zoom-in-Net consists of three parts, namely a mainnet (M-Net) using Inception-Resnet as the backbone, whichaims to extract features and can output diagnostic results; anA-Net, which can generate attention maps using only image-level supervision; and a C-Net, which simulates the zoom-in operation when clinicians examine images. Lin et al.(2018) proposed a framework based on anti-noise detectionand attention-based fusion which can perform ﬁve-class DRgrading. They ﬁrst extract the features using a CNN, thenfeed them into a designed center-sample detector to generatelesion maps. Lesion maps and original images are sent to theproposed attention fusion network (AFN), which can learn theweights of the original images and lesion maps to reduce theinﬂuence of unnecessary information.

Benchmark works.

Although, according to priori knowl-edge, detecting related lesions is helpful for the diagno-sis / grading of DR, lesion detection is actually a complex anddi ﬃ cult task, and there exists a trade-o ﬀ between lesion detec-tion and DR grading.Li et al. (2019b) built a dataset called DDR. DDR is theonly dataset considering both DR and lesion detection; it is thelargest dataset for lesion detection and second largest for DRgrading. The authors evaluated ten state-of-the-art deep learn-ing models on this dataset, including ﬁve classiﬁcation mod-els, two segmentation models, and three detection models. Al-though these methods achieved a maximum acc of 0.8284 in DR grading, their performance in lesion segmentation and de-tection was particularly poor, indicating that the detection orsegmentation of lesions is a very challenging task. Ahmadet al. (2019) performed a benchmark work on Messidor-2.They evaluated eight state-of-the-art deep learning classiﬁca-tion models and generated class activation maps (CAMs) oflesions at the same time. The results showed that there is atrade-o ﬀ between classiﬁcation and localization. As the net-works’ depth and parameters increased, they performed betterin classiﬁcation, while performing worse in localization. There are several other approaches for DR grading, includ-ing a bi-linear strategy, a hybrid method and the IDRiD chal-lenge.

Bi-linear strategy with attention mechanism.

Zhao et al.(2019e) proposed a BiRA-Net to perform DR grading. In theintroduced RA-Net, features extracted from ResNet were fedto a proposed attention net to pay more attention to the deci-sive areas for grading. A bilinear strategy was adopted to traintwo RA-Nets for more ﬁne-grained classiﬁcation.

Hybrid method combined with manually designed fea-tures.

Roy et al. (2017) proposed a strategy that combinesCNN and dictionary-based strategies for DR severity assess-ment. The activation value of the second fully connected layer(FC2) of the CNN was converted to a discriminative pathologyhistogram (DPH) and generative pathology histogram (GPH),which consist of manually designed features with speciﬁc con-cerns. The two histogram feature vectors and the original-sizeimage were fused with the CNN’s FC2 response, and ﬁnally adecision tree classiﬁer was used to obtain the ﬁnal result.

Smartphone-based diagnosis.

Natarajan et al. (2019) pro-posed an o ﬄ ine DR screening system on a smartphone to de-tect referable DR. Users can download the app and get DRdiagnosis results instantly. It is unbelievable to imagine thatthe diagnosis of DR can be performed by such a low-cost de-vice in such a convenient way. Such an o ﬄ ine system is ofgreat signiﬁcance to areas with limited medical resources. IDRiD challenge.

Porwal et al. (2020) described the IDRiDdataset and outlined the setup of the challenge “DiabeticRetinopathy Segmentation and Grading” held at ISBI2018.They also discussed a variety of deep learning models thatwere outstanding in the competition, as well as lessons learnedfrom analyzing of the submissions.

The diagnosis / grading of DR has been widely studied.There are several clinical style papers in this ﬁeld. In theseworks, a large number of images were typically collected andlabeled and the signiﬁcance of using deep learning in actualclinical diagnosis was assessed. From a technical point ofview, the diagnosis / grading of DR is a classiﬁcation task. Allwe need to do is to predict a number indicating the stageof DR. However, only providing a single number may con-fuse clinicians. They also need to know why the networkmakes certain decisions, and what are deemed decisive re-gions. Therefore, many works have focused on generating heatmaps or performing lesion segmentation at the same time.Other e ﬀ ective methods like attention mechanisms and hybridmethods have also been explored. It is also worth noting thata remarkable smartphone-based o ﬄ ine diagnosis system hasbeen created.However, the existing researches still face several short-comings. Rather than ﬁne segmentation, the heatmap is typ-ically generated coarsely and cannot provide lesion labels.Therefore, performing lesion segmentation and DR diagno-sis / grading at the same time is a promising direction. How-ever, as discussed previously, there is a trade-o ﬀ between DRdiagnosis / grading and lesion segmentation. This is mainly be-cause the high-level semantic features needed for the classiﬁ-cation task tend to lack the spatial information required forsegmentation. Reaching this trade-o ﬀ is important for thismulti-task problem. Glaucoma is one of the major causes of blindness world-wide. The number of glaucoma infections is expected to growto 112 million by 2040. Because of its irreversibility, earlyscreening of glaucoma is extremely important (Bourne et al.,2013; Tham et al., 2014). The datasets used in this ﬁeld areshown in Tab. 11. The SINDI, REFUGE and SEED datasetswere not introduced in Section 3.2. The SINDI dataset (Fuet al., 2018b) was established to assess the risk factors of vi-sual impairment in the Singapore-Indian community. It con-sists of 5,783 images, of which 5,670 are normal and 113 areglaucomatous. The REFUGE dataset (Orlando et al., 2020)was used in the REFUGE challenge. It contains 1,200 fundusimages with segmentation ground truth and clinical glaucomalabels. The SEED study (Zheng et al., 2013) was conducted insouthwestern Singapore between 2004 and 2011. The popula-tion included 3,353 Chinese, 3,280 Malays and 3,400 Indianadults aged 40 and older. Experimental results are shown inTab. 18.

Similar to DR, there are several clinical applications ofglaucoma diagnosis. Raghavendra et al. (2018) utilized an 18-layer CNN containing ﬁve conv layers to perform glaucomaclassiﬁcation. They obtained 1,426 images from KasturbaMedical College, Manipal, India. Liu et al. (2019c) proposed adeep learning system (DLS) named Glaucoma Diagnosis withConvoluted Neural Networks (GD-CNN), based on ResNet.They established a dataset named FIGD consisting of 241,032images. They further proposed an online deep learning (ODL)system to improve the generalization ability of GD-CNN.

Glaucomatous Optic Neuropathy (GON) diagnosis isalso widely studied. Li et al. (2018a) used deep learningto perform binary classiﬁcation of GON. They downloaded70,000 fundus images from the online dataset LabelMe ,and selected 48,116 images for annotation. They invited 27qualiﬁed ophthalmologists for labeling and used Inception-v3 http: // / as the classiﬁcation network. Phene et al. (2019) collected86,618 images from several sources, including EyePACS, In-oveon , AREDS, UK Biobank and three hospitals in India.They invited 43 graders to perform image-level and feature-level labeling. They also used Inception-v3 and trained anensemble of 10 networks. Their network can predict referableGON and the presence / absence of various ONH features at thesame time. / OC area

Based on the priori knowledge that the OD and OC areacan be helpful to diagnose glaucoma, many methods have paidattention to these two areas. Applications can be subdividedinto OD / OC segmentation and direct CDR estimation. OD / OC segmentation. dos Santos Ferreira et al. (2018) de-signed a texture descriptor with a CNN to diagnose glaucoma.They ﬁrst used a U-Net to segment the OD area, then in-spired by the domain knowledge of biology, designed a mod-ule called phylogenetic diversity indexes to extract semanticfeatures, and ﬁnally used a CNN-based classiﬁer for the di-agnosis of glaucoma. Pal et al. (2018) designed a G-EyeNetfor the classiﬁcation of glaucoma, which performs particularlywell when the dataset is small. They ﬁrst perform OD segmen-tation, then the extracted ROIs are fed to a U-Net-like archi-tecture for image reconstruction. Finally, an FC layer followedby a softmax classiﬁer were incorporated into to the encoderfor glaucoma classiﬁcation.

Direct CDR estimation.

Zhao et al. (2020b) abandonedthe intermediate step of segmenting the OD and OC area anddecided to directly estimate CDR from fundus images. In theproposed MFPPNet, fundus images are passed through threeDenseBlocks, and then the extracted features go through a fea-ture pyramid pooling module and a fully connected featurefusion module to learn and fuse multiscale features. Finally,random forest regression is used to perform CDR regression.

Multi-branched methods are also widely explored for glau-coma diagnosis. The results of multiple networks with dif-ferent focuses are fused together to achieve higher accuracy.Fu et al. (2018b) proposed the Disc-aware Ensemble Net-work (DENet) which contains four branches. The global im-age stream learns the image-level global features, employinga ResNet-50 as the backbone and using the original imagesas input. The second stream is a segmentation-guided net-work using a U-Net to segment the OD area as guidance forthe other two branches. FC layers are connected to the saddlelayer of U-Net to output classiﬁcation results. The local discregion stream and the disc polar transformation stream bothtake ResNet-50 as the classiﬁer, with the former taking thedisc region crop as input and the other take polar transformedversion. Chai et al. (2018) designed a multi-branch neural net-work (MB-NN) combining domain knowledge. MB-NN takesthree branches as input. The ﬁrst is a set of original images. http: // / https: // / aboutbiobankuk3 Table 18. Summary of several results for glaucoma diagnosis / grading Reference Dataset Backbone Loss SE / % SP / % ACC / % BACC / % AUC / %Li et al. (2019a) / Liet al. (2020b) RIM-ONE CNN K-L divergencefunction and CE 84.8 85.5 85.2 - 91.6dos Santos Ferreiraet al. (2018) RIM-ONE,DRISHTI-GS U-Net, CNN -

100 100 100 - Zhao et al. (2019d) ORIGA CNN contrastive lossand hinge loss - - - - Liao et al. (2020) ORIGA CNN - - - - - 88Li et al. (2019a) LAG CNN K-L divergencefunction and CE - Pal et al. (2018) DRIONS-DB Encoder-decoder network Reconstructionloss and CE - - - -

Fu et al. (2018b) SCES U-Net, ResNet50 Dice coe ﬃ cientloss and CE - Fu et al. (2018b) SINDI U-Net, ResNet50 Dice coe ﬃ cientloss and CE - Raghavendra et al.(2018) Private CNN - - -Li et al. (2018a) Private Inception-v3 - - -

Phene et al. (2019) Private Inception-v3 - - - - -

Chai et al. (2018) Private FCN, CNN, Faster-RCNN CE - -Liu et al. (2019c) Private FIGD ResNet CE - -

The second branch is the optical disc region generated byFaster-RCNN. The third branch contains domain knowledgefeatures, which include image features such as CDR and PPAsize, and non-image features such as age, intraocular pressureand eye sight.

There are also some inspiring studies that generate evidencemaps when performing glaucoma diagnosis. The approachesinclude a weakly supervised method, using a LAG datasetcontaining evidence label and a multiscale method.

Weakly supervised method.

Zhao et al. (2019d) proposeda weakly-supervised multi-task Learning method (WSMTL)to perform accurate evidence identiﬁcation, optic disc seg-mentation and automated glaucoma diagnosis simultaneously.First a skip and densely connected CNN is used to capturemultiscale features. Then, the extracted features are fed tothe proposed pyramid integration structure to generate high-resolution evidence maps. These evidence maps are passedto a constrained clustering branch which clusters pixels withrelational constraints. The evidence maps are also fed to afully-connected discriminator to diagnose glaucoma.

Using dataset containing evidence map label.

Li et al.(2019a) established a large-scale attention-based glaucoma(LAG) dataset. LAG contains 5,824 fundus images and at-tention maps provided by ophthalmologists. They also pro-posed an AG-CNN to diagnose glaucoma. First, an attentionprediction subnet was introduced to generate attention maps.In this subnet, multiscale and channel attention methods are utilized. Then, a pathological area localization subnet was de-signed to locate the pathological area, in which attention mapsare embedded to feature maps at each stage. Finally, the lo-cated pathological areas and predicted attention maps are con-catenated together and fed to a glaucoma classiﬁcation subnetto predict the binary label of glaucoma. In their subsequentwork (Li et al., 2020b), they extended their LAG dataset to11,760 fundus images. They also proposed a weakly super-vised learning strategy for AG-CNN.

Multiscale networks.

Liao et al. (2020) introduced a clin-ically interpretable ConvNet architecture (EAMNet) for glau-coma diagnosis. They ﬁrst used a CNN as backbone networkwith several residual blocks to extract useful features. Then amethod named Multi-Layers Average Pooling (M-LAP) wasproposed to bridge the gap between low-level localization in-formation and high-level semantic information. Moreover, ev-idence activation maps (EAMs) were obtained by weightedsummation of feature maps.

Like in DR grading, there are several glaucoma diagnosispapers that care more about clinical applications, as discussedin Section 4.2.1. Further, Sections 4.2.2, 4.2.3, and 4.2.4 canall be regarded as focusing on the OC area from di ﬀ erent as-pects. Section 4.2.2 describes methods that perform OD / OCsegmentation or CDR estimation and glaucoma diagnosis si-multaneously. In Section 4.2.3, OD / OC segmentation servesas a branch to guide the glaucoma diagnosis task. Finally, inSection 4.2.4, heatmaps are generated to highlight decisive re- gions for glaucoma diagnosis.However, there is still room for improvement in the diag-nosis of glaucoma. First, just like in DR diagnosis, whileheatmaps can be used to provide guidance for diagnosis, themore accurate task of OD / OC segmentation should be empha-sized. Second, the diagnosis of glaucoma does not only lie inCDR estimation; there are several other factors that can a ﬀ ectthe result, such as age, race and family history. However, fewworks focus on these factors. Age-related macular degeneration (AMD) is the leadingcause of vision loss among people aged 50 and above. 6.2million people worldwide su ﬀ ered from AMD in 2015 (Voset al., 2016). The datasets used in this ﬁeld are shown in Tab.19. AREDS is widely used in AMD diagnosis. The AREDSdataset (The Age-Related Eye Disease Study Research Group,1999) consists of over 206,500 images acquired from 5,208participants. iChallenge-AMD was used as the dataset of theiChallenge competition. It consists of 1,200 images, of which77% are from non-AMD subjects and 23% are from AMD pa-tients. Labels for AMD / non-AMD, disc boundaries and fovealocations and lesion boundaries are provided. The KORAdataset (Brandl et al., 2016) was acquired from 2,840 indi-viduals aged 25 to 74 years old from South Germany. Experi-mental results are shown in Tab. 20. Burlina et al. (2016) are one of the very ﬁrst to use deeplearning for AMD diagnosis. They used a pre-trained Over-Feat DCNN to map original images into a 4,096-dimensionalfeature vector. Then the vectors were passed through a linearSVM classiﬁer to output accurate AMD binary classiﬁcationresults, namely disease-free / early stages and referable inter-mediate / advanced stages. In their following work (Burlinaet al., 2017), they expanded the previous method by usingdatasets about 10 to 20 times larger. Horta et al. (2017) madeseveral modiﬁcations to Burlina et al. (2016). They addedsome side channel features such as sunlight, education andgender. Further, they used two inscribed rectangles of fundusimages at di ﬀ erent scales as input, and thus obtained 8,192-dimensional feature vectors. However, the dimension of theside channel features was much smaller than 8,192 dimen-sions. In order to alleviate this imbalance, they used PCA fordimension reduction. These features were then fused togetherto train a random forest classiﬁer for ﬁnal AMD classiﬁcation. Govindaiah et al. (2018) evaluated the performance ofdeep learning networks in two-class (no or early AMD andintermediate or advanced AMD) and four-class (no AMD,early AMD, intermediate AMD and advanced AMD) classi-ﬁcation for AMD. The networks evaluated include VGG-16with transfer learning, VGG-16 without transfer learning, andResNet-50. The experimental results showed that, whether on two-class or four-class classiﬁcations, VGG-16 without trans-fer learning performs best. Tan et al. (2018) designed a 14-layer CNN for early AMD diagnosis. Their data was ob-tained from the Ophthalmology Department of Kasturba Med-ical College (KMC), included 402 normal images, 583 retinalimages with early, intermediate AMD, or GA and 125 reti-nal images with evidence of wet AMD. Burlina et al. (2018)used deep learning for detailed severity characterization andestimation of ﬁve-year risk among patients with AMD. Theclassiﬁcation network used was ResNet-50. For AMD sever-ity scales, they employed four-step and nine-step scales, re-spectively. For the estimation of ﬁve-year risk of progressionto advanced AMD, they evaluated three deep learning-basedstrategies, namely Soft Prediction, Hard Prediction and Re-gressed Prediction.

Ensemble strategies have also been used in AMD diagno-sis. Grassmann et al. (2018) used an ensemble network for 13-class AMD classiﬁcation. The pre-processed images were in-dependently trained using six di ﬀ erent CNNs (AlexNet, VGG-16, GoogLeNet, Inception-v3, ResNet, Inception-ResNet-v2).The results of the six networks were fused using randomforests. Guidance from lesion detection is helpful to AMD diag-nosis. Peng et al. (2018) proposed a DeepSeeNet to gradethe severity of AMD (0-5). Their network consists of threeparts, Drusen Net (D-Net) for detecting drusen in 3 sizes(none / small, medium and large), Pigment-Net (P-Net) for de-tecting pigment abnormalities (hypopigmentary or hyperpig-mentary) and Late AMD-Net (LA-Net) to detect the presenceof late AMD (neovascular AMD or central GA). The threesubnetworks all use an Inception-v3 structure. In this section, we have discussed approaches for AMD di-agnosis. All of them are based on a CNN architecture. Hy-brid methods, ensemble strategies and guidance from lesiondetection have also been explored. However, there are stillseveral limitations. First, the attention paid to AMD does notmatch its prevalence and severity. There is much less researchon AMD diagnosis than on DR and glaucoma. Second, thedatasets and the number of images used for AMD diagnosisare far fewer than those for DR and glaucoma. Finally, theactual amount of data in the website is inconsistent with whatis claimed in the original paper.

Diabetic macular edema (DME) is the most common com-plication of DR and may cause severe vision loss (Ciulla et al.,2003). Two approaches addressing this task use a two-stagearchitecture and multiscale method respectively.

Two-stage architecture.

Mo et al. (2018) proposed thecascaded deep residual networks for DME diagnosis. Thedatasets used include e-ophtha EX and Hamilton Eye In-stitute Macular Edema Dataset (HEI-MED). The HEI-MEDdataset (Giancardo et al., 2012) consists of 169 images, of http: // / luca / heimed.php5 Table 19. Widely used datasets for AMD diagnosis / grading Dataset name Number of images Resolution Camera AvailabilityAREDS Over 206,500 images - - available online iChallenge-AMD 1200 - - available on registration KORA images from 2840 individuals - - available online https: // / projects / gap / cgi-bin / study.cgi?study id = phs000001.v3.p1 http: // ai.baidu.com / broad / introduction?dataset = amd https: // epi.helmholtz-muenchen.de / Table 20. Summary of several results for AMD diagnosis / grading Reference Dataset Backbone Loss Category SE / % SP / % ACC / % AUC / % Kappa / %Burlina et al. (2016) AREDS CNN with SVM - 2(1vs.3,4) - -Burlina et al. (2017) AREDS CNN with SVM - 2 - - 88.4 ∼ ∼ -Horta et al. (2017) AREDS CNN with RF - 2 66.34 88.95 79.04 84.76 -Govindaiah et al. (2018) AREDS CNN - 2 - - 92.5 - -Govindaiah et al. (2018) AREDS CNN - - - - -Burlina et al. (2018) AREDS ResNet50 Regression loss 4 - - 75.7 - -Peng et al. (2018) AREDS Inception-v3 - 6 - Burlina et al. (2018) AREDS ResNet50 Regression loss 9 - - - -Grassmann et al. (2018) AREDS,KORA CNN weighted k metric 13 - - - -Tan et al. (2018) Collected CNN - 2 - - which 115 are healthy and 54 contain exudates. Their frame-work consists of two stages. The ﬁrst stage is an exudate seg-mentation network adopting a deep fully convolutional resid-ual network (FCRN). Then a ﬁxed-size region is cropped,where the center pixel has the maximal probability value. Thecropped region is fed to the second stage, which is a deepresidual network performing binary classiﬁcation.

Multiscale networks.

He et al. (2019) proposed a DME-Net based on the multiscale method for DME classiﬁcation.They used IDRiD and MESSIDOR as their datasets. They ﬁrstpassed fundus images through a U-Net to generate fovea andhard exudate region masks. Then a multiscale feature extrac-tion module using VGG-16 as the backbone was designed, inwhich a GAP operation was applied to feature maps of eachstage. Then the features were concatenated to obtain multi-scale features. They passed the original fundus images, ob-tained fovea and hard exudate region masks, and the macularregion cropped from fundus images through the proposed mul-tiscale feature extraction modules respectively. The featureswere fused together and fed to a XGBoost classiﬁer (Chenand Guestrin, 2016) to output the ﬁnal results.

Discussion.

The two approaches introduced above both de-tect exudates as guidance for the DME diagnosis task. There-fore, the architectures used are both multi-stage. Possible fu-ture work for this task may be to design a more lightweight ar-chitecture and decrease the parameters to be trained. Anotherpromising direction is to detect DME and DR at the same time.Such works can be seen in Section 4.7.

Retinopathy of prematurity (ROP) is an eye disease that of-ten occurs in infants with low birth weight or premature birth.It is the main cause of childhood blindness (Tasman et al.,2006). Brown et al. (2018) used deep CNNs to diagnose plusdisease in ROP. The plus disease is deﬁned as arterial tortu-osity and venous dilation of the posterior retinal vessels thatis greater than or equal to that found in a standard publishedretinal photograph (for Retinopathy of Prematurity Cooper-ative Group, 1988). The presence of plus disease is the mostcritical feature of severe, treatment-requiring ROP. Their train-ing dataset contains 5,511 images, and formed part of the mul-ticenter Imaging and Informatics in Retinopathy of Prematu-rity (i-ROP) cohort study. They ﬁrst used a U-Net for prepro-cessing, then an Inception-v1 architecture was employed todiagnose plus disease. Experimental results showed that thedeep CNNs outperformed six of eight ROP experts invited.Taylor et al. (2019) used deep learning to objectively monitorROP progression. Their data was also from the i-ROP study.In their work, a quantitative ROP vascular severity score wasdeveloped using previous work (Brown et al., 2018). Trackingthe quantitative severity score may be an e ﬀ ective method foridentifying patients at risk of disease progression. Three-stage architectures have been shown to be suitableframeworks for ROP diagnosis. Hu et al. (2019) used deeplearning to classify ROP. To solve the problem of insu ﬃ cientlabeled data, they collected 2,668 examinations obtained from720 infants. Each examination consists of several fundus im-ages from di ﬀ erent views. Because only a few of the n imagesin one ROP examination may contain features that can diag- nose ROP, it is necessary to extract features from all the im-ages in one ROP examination and fuse these features. To thisend, they designed a three-stage network which is divided intothree parts: feature extraction, feature fusion and classiﬁca-tion. In the feature extraction stage, there are a total of n CNNnetworks with identical structures, which are used to extractfeatures from the n images in one examination. In the fea-ture fusion stage, they considered two ways of max and mean,to fuse the n h × w × c features from the feature extraction stageinto one h × w × c feature map. The ﬁnal classiﬁcation stage hasa convolutional layer and a GAP layer for the ﬁnal binary clas-siﬁcation of ROP. For the network structure, they consideredVGG-16, Inception-v2 and ResNet-50, and also experimentedwith di ﬀ erent image resolutions. Discussion.

From Hu et al. (2019)’s work we can see thatonly a few images of a ROP examination contain useful fea-tures. This situation is similar to MRI and CT image process-ing. Researchers can thus borrow some inspiration from thesetasks. However, publicly available datasets are still limited innumber. More ﬁnely-annotated datasets are called for.

Cataracts can cause severe vision loss and are one of themost serious eye diseases that can cause blindness. Zhouet al. (2020) used deep learning for cataract diagnosis (non-cataract / cataract) and cataract grading (non-cataract, mildcataract, moderate cataract and severe cataract). They ﬁrstestablished a dataset containing 1,335 images from BeijingTongren hospital, China. 433 images of the dataset are non-cataract and 922 images have cataracts. They proposed dis-crete state transition (DST) and empirical DST (EDST) strate-gies. In the DST strategy, the weights and activation valueswere restricted in a uniﬁed discrete space, while in EDSTthey were restricted in an exponential discrete space. DSTand EDST can reduce the networks’ energy consumption andprevent overﬁtting. When using priori knowledge, they ex-tracted the improved Haar wavelet features and visible struc-ture features from fundus image as the input of DST-MLPand EDST-MLP. When not using priori knowledge, they usedDST-ResNet and EDST-ResNet as classiﬁcation networks. Xuet al. (2020) introduced a hybrid global-local representationCNN for cataract grading. They established a dataset consist-ing of 8,030 images, which were manually annotated by oph-thalmologists. They ﬁrst used AlexNet to learn global featuresand then used a deconvolutional network (DN) in each CNNlayer to analyze which pixel contributes most to the classiﬁ-cation, and explain the misclassiﬁcation cases. Then a hybridmodel which is an ensemble of several AlexNets, was em-ployed combining global and local features. Discussion.

Cataract diagnosis using fundus images is apromising direction. However, there are several limitationsremaining. First, there are no publicly available datasets. Sec-ond, there is no uniﬁed grading standard like in DR diagnosis.There limitations make it di ﬃ cult to compare di ﬀ erent works. The diagnoses of di ﬀ erent eye diseases may a ﬀ ect eachother. For example, for a patient who has both glaucoma and cataracts, it may be di ﬃ cult to diagnose the glaucoma becauseof the unclear biomarkers caused by the cataracts. Therefore,the diagnosis of multiple diseases may be a possible solutionto this problem. Moreover, the diagnosis of multiple diseasessimultaneously is more convenient and helpful to clinicians.The diagnosis of multiple diseases can be divided into simul-taneous DR and DME diagnosis, simultaneous DR, glaucomaand AMD diagnosis, the diagnosis of eight diseases usingpaired CFPs, the diagnosis of 36 diseases and rare pathologiesdetection. Simultaneous DR and DME diagnosis.

Li et al. (2020c)proposed a cross-disease attention network (CANet), whichcan simultaneously diagnose DR and DME. CANet containstwo di ﬀ erent types of attention modules. Disease-speciﬁcattention module is used to selectively learn useful featuresfor diagnosing the speciﬁc disease. Disease-dependent at-tention module can further learn the internal relationship be-tween the two diseases. Features extracted from ResNet-50are passed through two disease-speciﬁc attention modules andtwo disease-dependent attention modules successively. Theyused IDRiD and MESSIDOR as their datasets. The diagnosisof the two diseases can be mutually enhanced. Tu et al. (2020)proposed a multi-task network named feature Separation andUnion Network (SUNet) for simultaneous DR and DME grad-ing. Experiments were carried on the IDRiD dataset. Theyﬁrst used ResNet-34 to extract features for all tasks. Thena feature blending block was proposed, which contains a se-quence of feature separation and feature union layers. Thefeature separation layers learn task-speciﬁc features, i.e., di-agnosis features for the multi-disease diagnosis block (MD-Block) and lesion features for lesion regularize net (LR-Net).The feature union layers are able to learn useful union featuresfor both branches. Finally, the MD-Block is used to predict re-sults of DR and DME grading. Simultaneous DR, glaucoma, and AMD diagnosis.

Tinget al. (2017) used a CNN network to perform referable DR,vision-threatening DR, possible referable glaucoma and refer-able AMD diagnosis simultaneously. Their dataset contains494,661 retinal images, which were obtained from the ongo-ing Singapore National Diabetic Retinopathy Screening Pro-gram (SIDRP) (Nguyen et al., 2016). The backbone networkused is VGG-Net.

Diagnosis of eight diseases using paired CFPs.

Li et al.(2020a) proposed a Dense Correlation Network (DCNet) to di-agnose eight diseases using paired color fundus photographs(CTF) from the ODIR dataset . The ODIR dataset consistsof 10,000 paired images from 5,000 Chinese patients. Eightkinds of labels denoting the stages of speciﬁc diseases are pro-vided for each image. Here, paired refers to images of the lefteye and the right eye from the same patient. DCNet consistsof a shared CNN feature extractor for paired CFPs, ResNetin this case, a spatial correlation module (SCM) and a ﬁnalclassiﬁer. The SCM is utilized to capture dense correlationsbetween extracted features and fuse relevant ones. Diagnosis of 36 diseases.

Wang et al. (2019d) used multi- https: // github.com / nkicsl / OIA-ODIR7 task learning to diagnose 36 diseases simultaneously. Toachieve this, they collected and relabeled 200,817 images with36 categories, of which 17,385 images have more than onelabel. Their proposed network structure is divided into twostages. The ﬁrst stage has a modiﬁed YOLO-v3 (Redmonand Farhadi, 2018) as the main structure, which is used to de-tect the macula and the OD / OC area. The second level hasthree branches, namely the general task stream, macular taskstream, and optic-disc task stream, which use original images,the macula area, and the OD and OC area as inputs, respec-tively. The general task stream uses Inception-ResNet-v2 asthe backbone network to detect general retinal diseases, fus-ing features from the other two streams. The macular taskstream is used to detect macular diseases. It uses Inception-v3as the backbone network and fuses features from the generaltask stream. The optic-disc task stream also uses Inception-v3as the backbone network, but it is independent and does notfuse features from the other branches.

Rare pathologies detection based on few-shot learning.

Quellec et al. (2020) used few-shot learning to perform rarepathologies detection. They used the OPHDIAT dataset fortraining. This dataset (Massin et al., 2008) consists of 763,848images acquired from the Ile-de-France area. DR grading isprovided for every image. Moreover, the ophthalmologistsalso indicated his or her ﬁndings in free-form text. The im-ages contain 41 conditions, some of which are rare patholo-gies. Based on the observation that CNNs trained to detectfrequent conditions, such as DR, also cluster many other un-related conditions in the feature space, the authors trained aCNN classiﬁer and derived several simple probabilistic mod-els from its feature space to detect rare conditions, solving thefew-shot learning problem.

Discussion.

The diagnosis of multiple diseases is very sig-niﬁcance in clinical practice. In this subsection, we ﬁrst dis-cussed the simultaneous diagnosis of DR and DME and the di-agnosis of three di ﬀ erent diseases. Then we introduced threeworks that focus on multiple disease diagnosis for up to 41classes. The methods used vary but are all inspiring. In con-clusion, this developing direction is very promising and de-serves more attention, since more comprehensive systems areneeded in practice. Therefore, more experiments should becarried on newly built datasets such as ODIR.

5. Image synthesis

As mentioned before, the training datasets for medicalimaging often consist of a fewer number of images than inother deep learning tasks. Further, high-quality annotateddatasets are often costly to obtain. One possible solution isimage synthesis. Image synthesis can increase the number offundus images, help us to better understand the images andimprove model performance.

Synthesis for glaucoma.

Deshmukh and Sivaswamy(2019) proposed a deep learning based method to synthesizethe ONH region of fundus images. Given the OD, OC, andblood vessel segmentation masks from arbitrary fundus im-ages, their method can generate high-quality images with ves-sels bending at the edges of the OC, like in real images. The generator contains four U-Nets. Three parallel branches takean OC, OD and vessel mask as input, respectively, and theoutputs are jointly passed through another U-Net to generateRGB images. The synthetic and original images along withtheir corresponding OC, OD and vessel masks compose theinput of the discriminator, which employs a ﬁve-layer FCNas the backbone. For the datasets, they used Drishti-GS andDRIVE. Diaz-Pinto et al. (2019) used deep convolutional gen-erative adversarial networks (DCGANs) to obtain a fundus im-age synthesizer and used a semi-supervised method for glau-coma assessment. Their system can not only generate syn-thetic images, but also provide labels automatically. They col-lected 14 datasets as the training set, including an unprece-dented 86,926 images. Their two systems, DCGAN and SS-DCGAN, have similar structures, and both contain a genera-tor and a discriminator. The di ﬀ erence between them is thatthe DCGAN only performs image synthesis, while the SS-DCGAN can predict glaucoma by changing the ﬁnal outputlayer of the discriminator of DCGAN. Wang et al. (2019e)proposed a pathology-aware visualization strategy for glau-coma classiﬁcation and a pathology-based GAN (Patho-GAN)for image synthesis. For brevity, the pathology-aware visual-ization net will not be described here. Di ﬀ erent from the usualGAN network, the synthetic images generated by the genera-tor and the original images are passed through the pathology-aware visualization net and the pathological loss is calculated.Speciﬁc pathological areas can be enhanced by optimizing thepathological loss. They used LAG as their dataset. Synthesis for vessel segmentation.

Costa et al. (2018) uti-lized a GAN to perform retinal image synthesis. An adver-sarial autoencoder is ﬁrst trained to reconstruct vessel mapsand learn a latent space associated with a normal distribution.Then the generated vessel maps are passed to a GAN. Thegenerator of GAN is used to generate synthetic retinal im-ages that can fool the discriminator. Synthetic pairs and thereal pairs are then passed to the discriminator. Once trained,a synthetic retinal image can be generated using decoder ofthe adversarial autoencoder and generator of the GAN, usinga normal distribution as input. In their implementation, thevessel annotation-free dataset Messidor-1 and DRIVE datasetwere used. Zhao et al. (2018) proposed a Tub-GAN and a Tub-sGAN for retinal and neuronal image synthesis, which workwell on small datasets. They aimed to learn a mapping froma tubular structured annotation to a synthetic image. In Tub-GAN, a GAN is employed with a vessel map ground truthand random noise as input. In their Tub-sGAN, style trans-fer is incorporated using VGG-Net learning style features andcontent features. In Zhao et al.’s following work (Zhao et al.,2019a), they proposed a R-sGAN to perform image synthesisfor further segmentation of unannotated fundus images. Theirframework consists of two stages. In the ﬁrst stage, vesselmaps from ﬁnely-annotated datasets and retinal images fromunannotated images are passed to R-sGAN as input to gener-ate retinal images that have the same style as the unannotateddatasets. R-sGAN is a non-linear variant of GRU (Chunget al., 2014). Then, image pairs of vessel maps and gener-ated retinal images are passed to a segmentation network for training. Once trained, the segmentation network can performsegmentation of unannotated images. Both works used theDRIVE, STARE and HRF datasets. Synthesis for DR.

Zhou et al. (2019) proposed a diabeticretinopathy generative adversarial network (DR-GAN). It cangenerate high-resolution images given arbitrary DR gradingand lesion information. They designed a generator condi-tioned on vessel and lesion masks to generate high-resolutionimages. They introduced a ﬁne-grained design which aims tolearn better and more realistic local details. A multiscale dis-criminators framework was also employed, containing threeidentical discriminators. The only di ﬀ erence between the dis-criminators is the resolution of input images. For the dataset,they used EyePACS, IDRiD and DRIVE. Synthesis for AMD.

Burlina et al. (2019) utilized a GANto perform image synthesis for AMD. The GAN model wastrained using 133,821 fundus images from AREDS as input.They invited two ophthalmologists to diagnose AMD on realimages and synthetic images. The results obtained are simi-lar for real and synthetic images. Moreover, a classiﬁcationnetwork trained only on the synthetic images showed similarperformance to the network trained only on real images.

Smartphone camera image synthesis.

It is relatively easyas well as cost-e ﬀ ective to collect fundus images using asmartphone camera (SC). However, the images are often low-quality, have uneven illumination and other problems. V andSivaswamy (2019) proposed a ResCycleGAN for image syn-thesis using SC images. Their modiﬁcations over CycleGAN(Zhu et al., 2017) are two-fold; namely, they introduced aresidual connection and proposed a structure similarity basedloss function. The dataset used consists of 540 images ac-quired using an iPhone6. Multimodal image reconstruction. Di ﬀ erent modalitiesprovide complementary views of the same real-world object.Image reconstruction from one modality to another is a self-supervised task. On the one hand, important general featurescan be learned in the reconstruction process. On the otherhand, images which are obtained invasively in practice canbe obtained using non-invasive modal images. Hervella et al.(2018) employed U-Net to perform image reconstruction fromretinography to angiography. They used the publicly availableIsfahan MISP dataset (Alipour et al., 2012), which contains59 retinography / angiography pairs. Image super resolution (ISR).

ISR takes low-resolutionimages as input and outputs super-resolved (SR) images. Thisis useful to several downstream, such as small or blurred lesionand biomarker detection. Mahapatra et al. (2017) proposed anISR method based on GANs. A local saliency map is obtainedby combining abstraction, element distribution and unique-ness. Then a local saliency loss is calculated and added to thecost function. Entropy ﬁltering is performed to highlight com-pact regions. They used DRIVE, STARE and CHASE DB1 astheir datasets.

Discussion.

Image synthesis based on deep learning is a rel-atively new task in fundus image processing. Synthetic images https: // misp.mui.ac.ir / data / eye-images.html can be used to help training, which can improve performanceand alleviate overﬁtting. In terms of architecture, nearly allapproaches used GANs. The latest powerful variant, Cycle-GAN, can also be seen in some applications. Image synthesiswill likely be a very popular direction in the near future. Itis hard to imagine what kinds of explorations could be doneusing GANs.

6. Other applications

There are several works focusing on rare pathologies, suchas pathological myopia and refractive error.

Pathological myopia is a common disease that can causeloss of vision. Guo et al. (2020b) introduced a lesion-awaresegmentation network (LSN) to perform atrophy and detach-ment segmentation, which is related to pathological myopia.The architecture is a U-Net-like encoder-decoder network.The authors added a classiﬁcation branch to the saddle layerto predict the existence of lesions. A feature fusion moduleis used in the decoder, which is designed as a multiscale net-work. To further boost the sensitivity to lesion edges, theyadded a loss function named edge overlap rate (EOR). Thetraining set used was taken from from PALM challenge inISBI 2019, consisting of 400 images. Refractive error is one of the leading causes of visual im-pairment. Varadarajan et al. (2017) used deep learning to di-agnose this. They used UK Biobank and AREDS as theirdatasets. The architecture is a combination of ResNet and softattention (Xu et al., 2015). Their network can also generateattention maps. Results showed that the foveal region is oneof the most import regions for making predictions.

Discussion.

It is good to see that these diseases have re-ceived some attention, despite not being as prevalent as otherdiseases like DR, glaucoma, etc. Diagnoses of rare patholo-gies are also important. It is expected that the successful ex-periences in other diseases can easily be well extended to therare pathologies diagnoses. Current challenges lie in the lackof data.

As shown in previous studies, the condition of the retinamay reﬂect other diseases. In fact, many ophthalmolo-gists have used ophthalmoscopes to diagnose systemic dis-eases such as hypertension, sarcoidosis and CMV infection.(Schmidt-Erfurth et al., 2018). We thus discuss studies on sys-temic disease diagnosis as follows.

Cardiovascular risk factors.

Poplin et al. (2018) useda deep learning method to predict multiple cardiovascularrisk factors including age, gender, smoking, systolic pressure(SBP) and so on. Their training dataset includes the data of284,335 patients, and was collected from the UK Biobank and https: // palm.grand-challenge.org / EyePACS. Inception-v3 was used as their classiﬁcation net-work. Moreover, to help clinicians better understand the de-cision process of the CNN network, they used a mechanismnamed soft attention to generate heatmaps, which can high-light decisive regions in the process of CNN classiﬁcation.

Ischemic strokes.

Lim et al. (2019) utilized deep learningmethods to predict strokes from fundus images. Images posi-tive for ischemic stroke were obtained from the MCRS study(Silva et al., 2009) while negative images were taken fromﬁve other fundus image datasets. Their classiﬁcation networkis VGG-16. They also adopted the feature isolation method.First, a U-Net was used to segment the vessel tree from origi-nal images. Then, vascular maps were used as the input of theclassiﬁcation network, providing additional information.

Annotation-free cardiac vessel segmentation.

Yu et al.(2019) proposed a knowledge transfer based shape-consistentgenerative adversarial network (SC-GAN) and a simpler AddU-Net for cardiac vessel segmentation. In SC-GAN, an aver-age fundus image and digital subtraction angiography (DSA)image were passed to a generator to obtain a synthetic imagethat had both retinal vessels and coronary arteries. A shape-consistent loss was proposed to ensure shape-consistency. Adiscriminator was then trained using synthetic images and realDSA images as input. Finally, a U-Net was trained usingsynthetic DSA images with synthetic labels for cardiac ves-sel segmentation. In Add U-Net, a U-Net was trained using anaverage fundus image and DSA image as input and a combina-tion of fundus image annotations and the Frangi segmentation(Frangi et al., 1998) results of DSA images as labels. Theyused DRIVE as the source domain and collected 1,092 coro-nary angiographies (DSA) with no annotations.

Biological age estimation.

Biological age (BA) is a widelyused aging biomarker. Liu et al. (2019b) developed a CNNclassiﬁer to estimate BA based on retinal images. Two datasetsnamed the Yangxi Dataset and Shenzhen Dataset were col-lected, containing 5,825 and 2,911 adults aged 50 years orolder, respectively. They employed a detail manipulationmethod to enhance the global details of the non-speciﬁc globalanatomical and physiological features related to aging. Thena VGG-19 network was used to estimate BA. They also pro-posed a joint loss to boost performance. Results showed thattheir method outperforms existing ‘brain age’ models.

Discussion.

The diagnosis of systemic diseases using fun-dus images is an encouraging direction. With the successfulstudies using multi-disease and smartphone-based o ﬄ ine di-agnosis systems, there is promise of predicting systematic dis-eases in a remote, non-invasive, o ﬄ ine, convenient way. Here we discuss approaches for two aspects of image pro-cessing, namely image registration and image quality assess-ment. Both of these are important for the processing and se-lection of images.

Image registration.

Zou et al. (2020) proposed an unsu-pervised architecture for non-rigid retinal image registration.They formulated the image registration task as a parameter-ized deformation function. Thus, the aim is to regress the non-linear spatial correspondence between a pair of images. For this regression task, they proposed the Structure-Driven Re-gression Network (SDRN) framework, which utilizes a mul-tiscale method to focus on global and local features simul-taneously. They used the publicly available Fundus ImageRegistration (FIRE) dataset (Hernandez-Matas et al., 2017),which consists of 129 retinal images forming 134 image pairs. Image enhancement.

Image enhancement is another wayto improve performance on existing datasets. Zhao et al.(2019b) proposed a data-driven strategy to enhance blurredfundus images in a weakly supervised manner. Their strat-egy uses two unpaired datasets for training. Their method isthe ﬁrst end-to-end deep generative model for blurred retinalimage enhancement. They designed two generators with thesame structure; one to enhance low-quality images to high-quality ones, and the other to convert high-quality images tolow-quality ones for training reference. Similarly, they de-signed two corresponding discriminators with the same struc-ture. They also introduced a dynamic retinal image featurelimit to guide the generator to improve performance and avoidthe over-enhancement of extremely blurred areas. They useda private dataset which consists of 550 blurry images and 550high-quality images for training and 60 blurry images for test-ing. The blurry images are from cataract patients and the high-quality ones are from normal people.

Image quality assessment.

Retinal image quality assess-ment (RIQA) is important for ensuring the quality of imagesused by clinicians and deep learning systems. Fu et al. (2019)proposed the Multiple Color-space Fusion Network (MCF-Net) for RIQA. They ﬁrst re-annotated an Eye-Quality (EyeQ)dataset with 28,792 images from EyePACS. Compared withother datasets, they extended binary labels (‘Accept’ and ‘Re-ject’) to ternary labels (‘Good’, ‘Usable’ and ‘Reject’). Forthe model architecture, they ﬁrst transferred the original RGBcolor-space to HSV and LAB color-space. Then, retinal im-ages with di ﬀ erent color-spaces were passed to their corre-sponding base network to extract features. Feature fusion wasperformed at a feature level and prediction level. Shen et al.(2020) proposed a domain-invariant interpretable fundus IQAsystem. In order to improve the interpretability, they addedthree clinically accepted aspects (artifact, clarity and ﬁeld deﬁ-nition) to the output and a visual feedback. A coarse-to-ﬁne ar-chitecture was introduced to locate landmarks, including ODand fovea, for robustness. In order to generalize well on di ﬀ er-ent datasets, they adopted a semi-tied adversarial discrimina-tive domain adaption model. They collected their dataset frompatients who participated in the Shanghai Diabetic Retinopa-thy Screening Program (SDRSP). Discussion.

The above three directions are all signiﬁcantin supporting other tasks. And the use of deep learning makethese image processings e ﬀ ective. Shen et al. (2020)’s workis particularly inspiring. Domain invariance and visualization,which are useful for clinicians, are also goals common to al-most all other tasks. It is clear that deep learning is on the fron-tier of research, and more architectures and methods should beexplored. https: // projects.ics.forth.gr / cvrl / ﬁre /

7. Conclusions and Discussions

As demonstrated in Sections 2 to 5, the performance of deeplearning for fundus image diagnostic tasks is quite impressive.In fact, deep learning methods have even achieved better per-formance than experienced humans in some cases. Speciﬁ-cally, deep learning can provide helpful suggestions for oph-thalmologists. It can detect and segment important biomark-ers, such as lesions, blood vessels, OD / OC, and provide ev-idence for physicians to diagnose speciﬁc diseases. It canalso directly predict whether a patient has an ophthalmic dis-ease, and can serve as a powerful assistant for physicians inthe screening of glaucoma, DR, AMD and other ophthalmicdiseases. Gulshan et al. (2019) compared the performanceof deep learning with that of a human expert and a trainedgrader in DR diagnosis in two hospitals in India. Their resultsshowed that the deep learning system is well generalized tothe actual data of India. Further, for the ﬁrst time in ophthal-mology, it was veriﬁed in practice rather than on datasets thatdeep learning achieves comparable or even better performancethan human experts. The excellent performance of deep learn-ing makes it a promising replacement for traditional computer-aided diagnostic systems (CADs). In fact, IDx (David et al.,2016) has already been approved by the US FDA for practicaluse. We believe that more deep learning methods will furtherbe deployed as stable, e ﬃ cient, and robust diagnostic systemsfor practical clinical diagnosis.In terms of network structure, the classiﬁcation back-bone network has evolved from VGG and Inception-v1 toInception-v2, Inception-v3, ResNet and DenseNet, while thesegmentation backbone network has evolved from manuallydesigned CNNs to FCNs and then to U-Net, Mask-RCNN,DeeplabV3 + , etc. However, using deep learning in fundus im-age analysis is more than simply applying the backbone net-works to speciﬁc tasks. There are many practical problemsemerging. For example, the number of pixels of biomarkerssuch as lesions, OD / OC, and blood vessels is much smallerthan that of the background. Further, the number of pixels ofthe unique curved structure of blood vessels, especially capil-laries, makes it a hard sample problem. To solve these prob-lems, speciﬁc methods are required according to the charac-teristics of each task. Advances in methods range from simpleapplications of deep learning to multi-branch, multiscale, andcoarse-to-ﬁne networks, as well as attention mechanisms, andso on. We have summarized the approaches in Sections 2 to 5according to tasks and methods, and provided an overall sum-mary in Fig. 6.

Although the application of deep learning in the ﬁeld of fun-dus image analysis has achieved gratifying performance, it isworth noting that it still has limitations in many other aspects.The problems which restrict the performance have not beensolved, and the inherent limitations of deep learning also re-main unresolved. Studying how to solve these limitations willbe a key issue in the future of this ﬁeld. We have discussed limitations for speciﬁc tasks in each section. Thus, here wewill only list limitations which are common to all tasks, andprovide possible solutions to these.

As mentioned previously, deep learning is data-driven.There are many large-scale datasets in the ﬁeld of natural im-age processing. For instance, ImageNet (Deng et al., 2009)has more than 14 million images. Fundus image datasets,however, are quite limited, as with other medical ﬁelds. Un-like natural images, the labeling of fundus images needs tobe completed by experts, and is very di ﬃ cult. For example,because of the lack of depth information, an expert typicallyrequires eight minutes to label a fundus image for OD / OC seg-mentation and glaucoma diagnosis (Lim et al., 2015). Theselimitations have resulted in a lack of high-quality labeled fun-dus images. The smaller the size of the dataset, the more likelyit will lead to lower accuracy and overﬁtting.In addition to waiting for ophthalmologists to label moredata, researchers can also seek measures to help alleviate thisproblem:

Weakly supervised learning.

Weakly supervised learningcan solve the lack of high-quality labeled data to a certain ex-tent. We saw in Sections 2 to 5 that weakly supervised meth-ods have already been applied to the ﬁeld of fundus images.Weakly supervised learning can be divided into incomplete,inexact and inaccurate supervision. Incomplete supervisionmeans that part of the data is labeled, while the other partis not; that is, the labels are incomplete. Active and semi-supervised learning can be used to solve this. Inexact super-vision means the granularity of the annotation does not matchthe problem to be solved. Multi-instance learning can be usedto address this. Inaccurate supervision means that the anno-tation is not completely accurate, so there are samples withincorrect annotations. One can consider learning with noisylabels to solve this. For speciﬁc relevant methods please referto Zhou (2018).

Image synthesis and enhancement.

Unlike weakly super-vised learning, which addresses the lack of high-quality an-notations, image enhancement can improve the quality of theimage, while image synthesis can directly generate realisticimages, even with labels. The excellent performance of im-age generation is due to the use of powerful GAN models. Infact, GANs have become mainstream for many image genera-tion problems, such as style transfer, image inpainting, superresolution, and so on. Section 5 introduced several speciﬁc ex-amples of GAN-based image synthesis. We believe that as thequality of the generated images improves, there will be moreapproaches that use these for training, and there will be in turnmore research to further improve the quality of the synthesizedimages.

Federated learning.

Creating a high-quality annotateddataset involves more than simply inviting ophthalmologiststo annotate the data. It is a complicated matter, and thereare many other considerations, including data privacy, com-petition in various research institutes and hospitals, and rel-evant laws and regulations. Note that many of the fundus image datasets are also private. How to achieve data shar-ing while satisfying diverse research groups, complying withregulations and not infringing upon user privacy is an urgentproblem to be solved. In 2016, Google proposed federatedlearning to solve the “data islands” problem. Federated learn-ing can be divided into horizontal federated learning, verticalfederated learning and federated transfer learning. Horizontalfederated learning means that the data features of two datasetsoverlap more and users overlap less. Vertical federated learn-ing means that the two datasets have more user overlap andless user feature overlap. Federated transfer learning meansthat both user and data features overlap little. Federated learn-ing is still a relatively new ﬁeld to be explored. Readers canrefer to Kairouz et al. (2019) for more information about fed-erated learning. The imbalance in fundus images is mainly an imbalancein foreground and background, or number of samples in dif-ferent classes. A large portion of the methods introduced inSections 2 to 5 were proposed to solve this problem. For in-stance, imbalance between foreground and background occursin many fundus image tasks, with the number of pixels in le-sions, blood vessels, OD and OC, being much smaller thanthe number of pixels in their respective backgrounds. Thisimbalance will directly increase the di ﬃ culty of training. Forlesions and OD / OC, using the detection network to extract theROI as the input of the network can increase the proportion ofthe foreground. Such an approach can be seen in Chai et al.(2018), Fu et al. (2018b), Sarhan et al. (2019), Shah et al.(2019) and Wang et al. (2019c). Using spatial attention, asdone by Wang et al. (2017) and Zhao et al. (2019e), is alsovery common and allows the network to focus on areas that aremore decisive for solving tasks. It is worth noting that Fu et al.(2018a) performed polar coordinate transformation to allevi-ate the imbalance between foreground and background basedon the unique ellipse shape of the OD / OC region. An imbal-ance in the number of samples in di ﬀ erent classes is also verycommon in fundus images. One solution is to use a class bal-ance loss function, such as cross-entropy loss and focal loss.Selective sampling is also a direction that has been explored.It uses a carefully designed sampling strategy to maintain acertain proportion of samples in di ﬀ erent classes during eachtraining epoch, thereby avoiding imbalance. This strategy wasused in van Grinsven et al. (2016), Gondal et al. (2017), Daiet al. (2018) and Sarhan et al. (2019). There are certain di ﬀ erences between the various fundusimage datasets, including acquisition camera, resolution, lightsource intensity, parameter settings, and so on. The di ﬀ er-ences between the datasets pose a challenge to the generaliza-tion performance of deep learning models. In fact, even somestate-of-the-art models only perform well on certain datasetsand degrade on others. This problem is mainly caused by thedistribution di ﬀ erence between di ﬀ erent datasets, that is, do-main shift (Ghafoorian et al., 2017). Domain adaption, intro-duced in Section 3.2.4, can be used to enhance the model’s performance on the target domain and solve the problemscaused by the domain shift. This strategy was used in Wanget al. (2019b), Wang et al. (2019c), and Liu et al. (2019d) toenhance the generalization performance of the optic disc seg-mentation model. Domain adaption is still an area being ex-plored, and there are various diverse methods for it. Readerscan learn more from Wang and Deng (2018). The excellent performance of deep learning comes at thecost of very high consumption, since the size of the parame-ters are much larger than those of traditional machine learning.This not only means that the models require signiﬁcant com-putational resources and time during training, but also pre-vents them from being deployed on portable devices, suchas binocular indirect ophthalmoscopes (Hajabdollahi et al.,2018). One direction to solve the high-consumption problemof deep learning is to design more novel network structuresand explore operations or layers with low computational loadand low memory consumption. However, this is mainly thework of basic network researchers. Researchers in this ﬁeldcan directly apply or learn from mature models, for exampleusing a lightweight network to decrease complexity. For in-stance, MobileNet (Sandler et al., 2018) could be used insteadof ResNet as the backbone. The models could also be com-pressed by quantizing the network weight to reduce the com-plexity. One approach is to quantize the weight and activationvalues in a CNN from a 32-bit ﬂoat number to a low-bit num-ber, or constrain the weights and activations to binary valuesand so on. The model pruning method can also compress themodels. This strategy changes some parameters in the modelto zero and skips the calculation. In addition, there are meth-ods such as Hu ﬀ man coding for the weights of the model.Some researchers have also explored how to reduce the con-sumption of fundus image network models speciﬁcally. Forexample, Hajabdollahi et al. (2018) used the method of quanti-fying weights and pruning to reduce the complexity of a bloodvessel segmentation model. An important issue in the application of deep learning toactual medical systems is to what extent doctors accept its“black box”. This lack of interpretability is an inherent defectin deep learning. Fortunately, several studies have focused onthis issue. Approaches for solving this can be divided intogenerating heatmaps, clinical meaning of heatmaps and otherexplorations.

Generating heatmaps.

The basic idea of several studies isto merge the feature maps of each layer of the deep networkto generate a heatmap, called a class activation map (CAM)or evidence map. The generated heatmap shows which part ofthe image the deep network referred to when making its ﬁnaljudgment. Keel et al. (2019) tried to generate heatmaps for DRand GON diagnosis systems. They used a threshold strategywhen generating the ﬁnal probability map to visualize decisiveregions for the prediction. The application of this idea can alsobe seen in Yang et al. (2017), Gondal et al. (2017), Quellec et al. (2017), Li et al. (2019a), Zhao et al. (2019d) and Li et al.(2020b). Clinical meaning of heatmaps.

The heatmap generatednot only provides a cue that how deep learning makes deci-sions, but also provides guidance and assistance to the diag-nostic process. Meng et al. (2020) explored how the processof generating heatmaps can improve performance of diseasediagnosis and explainability of the net. They ﬁrst generatedheatmaps using a gradient-based classiﬁcation activation map(Grad-CAM) (Selvaraju et al., 2017). Then the network wasﬁne-tuned by several designed losses and ophthalmologist in-tervention. Experimental results on a private dataset showedperformance improvement for the classiﬁcation task. Sayreset al. (2019) evaluated the role of deep learning in guiding di-agnosis. They invited 10 experts to grade DR in 796 fundusimages. Each image had three forms: the original image with-out auxiliary information, the image with only grading resultsand the images with grading results and heatmaps. Imageswere randomly assigned to di ﬀ erent ophthalmologists. Theresults showed that the assistance of deep learning diagnos-tic results improves the accuracy and conﬁdence of experts indiagnosing DR, especially with heatmaps. Other explorations.

Note that there are several otherstudies on the topic of interpretability. Ara´ujo et al.(2020) proposed a deep learning-based grading system namedDR | GRADUATE. In addition to the grading of DR, it can alsoestimate how uncertain the prediction is. de La Torre et al.(2020) proposed a deep learning-based interpretable classiﬁerfor DR grading. In their classiﬁer, a score similar to the con-cept of relevance was assigned to every point of the input andhidden spaces. The scores indicated the contributions to theﬁnal prediction. Niu et al. (2019) explored interpretability inthe diagnosis of DR and borrowed some ideas from Koch’slaw in infectious diseases.

Acknowledgments

This work is partially supported by the National NaturalScience Foundation (61872200), the Natural Science Foun-dation of Tianjin (19JCZDJC31600, 18YFYZCG00060) andthe Open Project Fund of State Key Laboratory of Com-puter Architecture, Institute of Computing Technology, Chi-nese Academy of Sciences No. CARCH201905

References

Abbasi-Sureshjani, S., Smit-Ockeloen, I., Zhang, J., ter Haar Romeny, B.M.,2015. Biologically-inspired supervised vasculature segmentation in SLOretinal fundus images, in: Kamel, M., Campilho, A.J.C. (Eds.), ImageAnalysis and Recognition - 12th International Conference, ICIAR 2015,Niagara Falls, ON, Canada, July 22-24, 2015, Proceedings, Springer. pp.325–334. URL: https://doi.org/10.1007/978-3-319-20801-5_35 , doi: .Abramo ﬀ , M.D., Garvin, M.K., Sonka, M., 2010. Retinal imaging and imageanalysis. IEEE Reviews in Biomedical Engineering 3, 169–208.Abr`amo ﬀ , M.D., Folk, J.C., Han, D.P., Walker, J.D., Williams, D.F., Russell,S.R., Massin, P., Cochener, B., Gain, P., Tang, L., Lamard, M., Moga,D.C., Quellec, G., Niemeijer, M., 2013. Automated Analysis of RetinalImages for Detection of Referable Diabetic Retinopathy. JAMA Ophthal-mology 131, 351–357. URL: https://doi.org/10.1001/jamaophthalmol.2013.1743 , doi: . Abr`amo ﬀ , M.D., Garvin, M.K., Sonka, M., 2010. Retinal imaging and imageanalysis. IEEE Reviews in Biomedical Engineering 3, 169–208.Adem, K., 2018. Exudate detection for diabetic retinopathy with circu-lar hough transformation and convolutional neural networks. ExpertSyst. Appl. 114, 289–295. URL: https://doi.org/10.1016/j.eswa.2018.07.053 , doi: .Ahmad, M., Kasukurthi, N., Pande, H., 2019. Deep learning for weak super-vision of diabetic retinopathy abnormalities, in: 16th IEEE InternationalSymposium on Biomedical Imaging, ISBI 2019, Venice, Italy, April 8-11,2019, IEEE. pp. 573–577. URL: https://doi.org/10.1109/ISBI.2019.8759417 , doi: .Alipour, S.H.M., Rabbani, H., Akhlaghi, M., 2012. Diabetic retinopa-thy grading by digital curvelet transform. Comput. Math. MethodsMedicine 2012, 761901:1–761901:11. URL: https://doi.org/10.1155/2012/761901 , doi: .Almazroa, A., Alodhayb, S., Osman, E., Ramadan, E., Hummadi, M., Dlaim,M., Alkatee, M., Raahemifar, K., Lakshminarayanan, V., 2018. Reti-nal fundus images for glaucoma analysis: the RIGA dataset, in: Zhang,J., Chen, P.H. (Eds.), Medical Imaging 2018: Imaging Informatics forHealthcare, Research, and Applications, International Society for Opticsand Photonics. SPIE. pp. 55 – 62. URL: https://doi.org/10.1117/12.2293584 , doi: .Ara´ujo, T., Aresta, G., Mendonc¸a, L., Penas, S., Maia, C., Carneiro, ˆA.,Mendonc¸a, A.M., Campilho, A., 2020. Dr | graduate: Uncertainty-awaredeep learning-based diabetic retinopathy grading in eye fundus images.Medical Image Anal. 63, 101715. URL: https://doi.org/10.1016/j.media.2020.101715 , doi: .Badar, M., Haris, M., Fatima, A., 2020. Application of deep learn-ing for retinal image analysis: A review. Comput. Sci. Rev.35, 100203. URL: https://doi.org/10.1016/j.cosrev.2019.100203 , doi: .Badrinarayanan, V., Kendall, A., Cipolla, R., 2017. Segnet: A deep con-volutional encoder-decoder architecture for image segmentation. IEEETrans. Pattern Anal. Mach. Intell. 39, 2481–2495. URL: https://doi.org/10.1109/TPAMI.2016.2644615 , doi: .Baskaran, M., Foo, R.C., Cheng, C., Narayanaswamy, A., Zheng, Y., Wu,R., Saw, S., Foster, P.J., Wong, T.Y., Aung, T., 2015. The prevalence andtypes of glaucoma in an urban chinese population: The singapore chineseeye study. JAMA Ophthalmology 133, 874–880.Bourne, R.R.A., Stevens, G.A., White, R.A., Smith, J.L., Flaxman, S.R.,Price, H., Jonas, J.B., Kee ﬀ e, J., Leasher, J., Naidoo, K., Pesudovs, K.,Resniko ﬀ , S., Taylor, H.R., 2013. Causes of vision loss worldwide,1990–2010: a systematic analysis. The Lancet Global Health 1, e339– e349. URL: , doi: https://doi.org/10.1016/S2214-109X(13)70113-X .Brandl, C., Breinlich, V.A., Stark, K., Enzinger, S., Asenmacher, M., Olden,M., Grassmann, F., Graw, J., Heier, M., Peters, A., et al., 2016. Featuresof age-related macular degeneration in the general adults and their depen-dency on age, sex, and smoking: Results from the german kora study.PLOS ONE 11.Brown, J.M., Campbell, J.P., Beers, A., Chang, K., Ostmo, S., Chan, R.V.P.,Dy, J., Erdogmus, D., Ioannidis, S., Kalpathy-Cramer, J., Chiang, M.F.,for the Imaging, in Retinopathy of Prematurity (i ROP) Research Consor-tium, I., 2018. Automated Diagnosis of Plus Disease in Retinopathy ofPrematurity Using Deep Convolutional Neural Networks. JAMA Ophthal-mology 136, 803–810. URL: https://doi.org/10.1001/jamaophthalmol.2018.1934 , doi: .Budai, A., Bock, R., Maier, A.K., Hornegger, J., Michelson, G., 2013.Robust vessel segmentation in fundus images. Int. J. Biomed. Imag-ing 2013, 154860:1–154860:11. URL: https://doi.org/10.1155/2013/154860 , doi: .Burlina, P., Freund, D.E., Joshi, N., Wolfson, Y., Bressler, N.M., 2016. Detec-tion of age-related macular degeneration via deep learning, in: 13th IEEEInternational Symposium on Biomedical Imaging, ISBI 2016, Prague,Czech Republic, April 13-16, 2016, IEEE. pp. 184–188. URL: https://doi.org/10.1109/ISBI.2016.7493240 , doi: .Burlina, P.M., Joshi, N., Pacheco, K.D., Freund, D.E., Kong, J., Bressler,N.M., 2018. Use of Deep Learning for Detailed Severity Charac-terization and Estimation of 5-Year Risk Among Patients With Age-3Related Macular Degeneration. JAMA Ophthalmology 136, 1359–1366. URL: https://doi.org/10.1001/jamaophthalmol.2018.4118 , doi: .Burlina, P.M., Joshi, N., Pacheco, K.D., Liu, T.Y.A., Bressler, N.M.,2019. Assessment of Deep Generative Models for High-Resolution Syn-thetic Retinal Image Generation of Age-Related Macular Degeneration.JAMA Ophthalmology 137, 258–264. URL: https://doi.org/10.1001/jamaophthalmol.2018.6156 , doi: .Burlina, P.M., Joshi, N., Pekala, M., Pacheco, K.D., Freund, D.E., Bressler,N.M., 2017. Automated Grading of Age-Related Macular DegenerationFrom Color Fundus Images Using Deep Convolutional Neural Networks.JAMA Ophthalmology 135, 1170–1176. URL: https://doi.org/10.1001/jamaophthalmol.2017.3782 , doi: .California Healthcare Foundation, 2015. Diabetic retinopathy detection -identify signs of diabetic retinopathy in eye images. .Carmona, E.J., Rinc´on, M., Garc´ıa-Feijo´o, J., Mart´ınez-de-la-Casa, J.M.,2008. Identiﬁcation of the optic nerve head with genetic algorithms. Artif.Intell. Medicine 43, 243–259. URL: https://doi.org/10.1016/j.artmed.2008.04.005 , doi: .Carson, L., Caroline, Y., Laura, H., Daniel, R., 2018. Retinal lesion detectionwith deep learning using image patches. Investigative Ophthalmology &Visual Science 59, 590–596.Chai, Y., Liu, H., Xu, J., 2018. Glaucoma diagnosis based on both hiddenfeatures and domain knowledge through deep learning models. Knowl.Based Syst. 161, 147–156. URL: https://doi.org/10.1016/j.knosys.2018.07.043 , doi: .Chen, L., Papandreou, G., Schro ﬀ , F., Adam, H., 2017. Rethinking atrousconvolution for semantic image segmentation. CoRR abs / http://arxiv.org/abs/1706.05587 .Chen, L., Zhu, Y., Papandreou, G., Schro ﬀ , F., Adam, H., 2018. Encoder-decoder with atrous separable convolution for semantic image segmenta-tion, in: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (Eds.), Com-puter Vision - ECCV 2018 - 15th European Conference, Munich, Ger-many, September 8-14, 2018, Proceedings, Part VII, Springer. pp. 833–851. URL: https://doi.org/10.1007/978-3-030-01234-2_49 ,doi: .Chen, T., Guestrin, C., 2016. Xgboost: A scalable tree boosting system, in:Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Ras-togi, R. (Eds.), Proceedings of the 22nd ACM SIGKDD International Con-ference on Knowledge Discovery and Data Mining, San Francisco, CA,USA, August 13-17, 2016, ACM. pp. 785–794. URL: https://doi.org/10.1145/2939672.2939785 , doi: .Cherukuri, V., G, V.K.B., Bala, R., Monga, V., 2020. Deep retinal imagesegmentation with regularization under geometric priors. IEEE Trans. Im-age Process. 29, 2552–2567. URL: https://doi.org/10.1109/TIP.2019.2946078 , doi: .Chung, J., G¨ulc¸ehre, C¸ ., Cho, K., Bengio, Y., 2014. Empirical evalua-tion of gated recurrent neural networks on sequence modeling. CoRRabs / http://arxiv.org/abs/1412.3555 .Ciulla, T.A., Amador, A.G., Zinman, B., 2003. Diabetic retinopathy and di-abetic macular edema: Pathophysiology, screening, and novel therapies.Diabetes Care 26, 2653–2664.Costa, P., Galdran, A., Meyer, M.I., Niemeijer, M., Abramo ﬀ , M., Mendonc¸a,A.M., Campilho, A.J.C., 2018. End-to-end adversarial retinal image syn-thesis. IEEE Trans. Medical Imaging 37, 781–791. URL: https://doi.org/10.1109/TMI.2017.2759102 , doi: .Dai, L., Fang, R., Li, H., Hou, X., Sheng, B., Wu, Q., Jia, W., 2018. Clini-cal report guided retinal microaneurysm detection with multi-sieving deeplearning. IEEE Trans. Med. Imaging 37, 1149–1161. URL: https://doi.org/10.1109/TMI.2018.2794988 , doi: .Dasgupta, A., Singh, S., 2017. A fully convolutional neural networkbased structured prediction approach towards the retinal vessel segmen-tation, in: 14th IEEE International Symposium on Biomedical Imaging,ISBI 2017, Melbourne, Australia, April 18-21, 2017, IEEE. pp. 248–251. URL: https://doi.org/10.1109/ISBI.2017.7950512 , doi: .Dashtbozorg, B., Mendonc¸a, A.M., Campilho, A.J.C., 2014. An automatic graph-based approach for artery / vein classiﬁcation in retinal images. IEEETrans. Image Process. 23, 1073–1083. URL: https://doi.org/10.1109/TIP.2013.2263809 , doi: .Dashtbozorg, B., Zhang, J., Huang, F., ter Haar Romeny, B.M., 2018. Retinalmicroaneurysms detection using local convergence index features. IEEETrans. Image Process. 27, 3300–3315. URL: https://doi.org/10.1109/TIP.2018.2815345 , doi: .David, A.M., Lou, Y., Ali, E., Warren, C., Ryan, A., Folk, J.C., Meindert,N., 2016. Improved automated detection of diabetic retinopathy on a pub-licly available dataset through integration of deep learning. InvestigativeOphthalmology & Visual Science 57, 5200–.Decenciere, E., Cazuguel, G., Zhang, X., Thibault, G., Klein, J.C., Meyer,F., Marcotegui, B., Quellec, G., Lamard, M., Danno, R., et al., 2013.Teleophta: Machine learning and image processing methods for teleoph-thalmology. Irbm 34, 196–203.Decenciere, E., Zhang, X., Cazuguel, G., Lay, B., Cochener, B., Trone, C.,Gain, P., Ordonezvarela, J., Massin, P., Erginay, A., et al., 2014. Feedbackon a publicly distributed image database: The messidor database. ImageAnalysis & Stereology 33, 231–234.Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F., 2009. Imagenet: Alarge-scale hierarchical image database, in: 2009 IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPR 2009),20-25 June 2009, Miami, Florida, USA, IEEE Computer Society. pp. 248–255. URL: https://doi.org/10.1109/CVPR.2009.5206848 , doi: .Deshmukh, A., Sivaswamy, J., 2019. Synthesis of optical nerve head regionof fundus image, in: 16th IEEE International Symposium on BiomedicalImaging, ISBI 2019, Venice, Italy, April 8-11, 2019, IEEE. pp. 583–586.URL: https://doi.org/10.1109/ISBI.2019.8759414 , doi: .Diaz-Pinto, A., Colomer, A., Naranjo, V., Morales, S., Xu, Y., Frangi,A.F., 2019. Retinal image synthesis and semi-supervised learning forglaucoma assessment. IEEE Trans. Med. Imaging 38, 2211–2218.URL: https://doi.org/10.1109/TMI.2019.2903434 , doi: .Edupuganti, V.G., Chawla, A., Kale, A., 2018. Automatic optic disk and cupsegmentation of fundus images using deep learning, in: 2018 IEEE Inter-national Conference on Image Processing, ICIP 2018, Athens, Greece, Oc-tober 7-10, 2018, IEEE. pp. 2227–2231. URL: https://doi.org/10.1109/ICIP.2018.8451753 , doi: .Feng, S., Zhuo, Z., Pan, D., Tian, Q., 2020. Ccnet: A cross-connected convo-lutional network for segmenting retinal vessels using multi-scale features.Neurocomputing 392, 268–276. URL: https://doi.org/10.1016/j.neucom.2018.10.098 , doi: .Feng, Z., Yang, J., Yao, L., 2017. Patch-based fully convolutional neu-ral network with skip connections for retinal blood vessel segmenta-tion, in: 2017 IEEE International Conference on Image Processing, ICIP2017, Beijing, China, September 17-20, 2017, IEEE. pp. 1742–1746.URL: https://doi.org/10.1109/ICIP.2017.8296580 , doi: .Foo, A., Hsu, W., Lee, M., Lim, G., Wong, T.Y., 2020. Multi-task learn-ing for diabetic retinopathy grading and lesion segmentation, in: TheThirty-Fourth AAAI Conference on Artiﬁcial Intelligence, AAAI 2020,The Thirty-Second Innovative Applications of Artiﬁcial Intelligence Con-ference, IAAI 2020, The Tenth AAAI Symposium on Educational Ad-vances in Artiﬁcial Intelligence, EAAI 2020, New York, NY, USA, Febru-ary 7-12, 2020, AAAI Press. pp. 13267–13272. URL: https://aaai.org/ojs/index.php/AAAI/article/view/7035 .Frangi, A.F., Niessen, W.J., Vincken, K.L., Viergever, M.A., 1998. Multi-scale vessel enhancement ﬁltering, in: Wells, W.M., Colchester, A., Delp,S. (Eds.), Medical Image Computing and Computer-Assisted Intervention— MICCAI’98, Springer Berlin Heidelberg, Berlin, Heidelberg. pp. 130–137.Fu, H., Cheng, J., Xu, Y., Wong, D.W.K., Liu, J., Cao, X., 2018a.Joint optic disc and cup segmentation based on multi-label deep net-work and polar transformation. IEEE Trans. Med. Imaging 37, 1597–1605. URL: https://doi.org/10.1109/TMI.2018.2791488 , doi: .Fu, H., Cheng, J., Xu, Y., Zhang, C., Wong, D.W.K., Liu, J., Cao, X., 2018b.Disc-aware ensemble network for glaucoma screening from fundus image.IEEE Trans. Med. Imaging 37, 2493–2501. URL: https://doi.org/10.1109/TMI.2018.2837012 , doi: .4Fu, H., Wang, B., Shen, J., Cui, S., Xu, Y., Liu, J., Shao, L., 2019. Eval-uation of retinal image quality assessment networks in di ﬀ erent color-spaces, in: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou,S., Yap, P., Khan, A. (Eds.), Medical Image Computing and ComputerAssisted Intervention - MICCAI 2019 - 22nd International Conference,Shenzhen, China, October 13-17, 2019, Proceedings, Part I, Springer. pp.48–56. URL: https://doi.org/10.1007/978-3-030-32239-7_6 ,doi: .Fu, H., Xu, Y., Lin, S., Wong, D.W.K., Liu, J., 2016. Deepves-sel: Retinal vessel segmentation via deep learning and conditional ran-dom ﬁeld, in: Ourselin, S., Joskowicz, L., Sabuncu, M.R., ¨Unal,G.B., Wells, W. (Eds.), Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016 - 19th International Conference,Athens, Greece, October 17-21, 2016, Proceedings, Part II, pp. 132–139. URL: https://doi.org/10.1007/978-3-319-46723-8_16 ,doi: .Fumero, F., Alay´on, S., S´anchez, J.L., Sigut, J.F., Gonz´alez-Hern´andez,M., 2011. RIM-ONE: an open retinal image database for optic nerveevaluation, in: Proceedings of the 24th IEEE International Sympo-sium on Computer-Based Medical Systems, 27-30 June, 2011, Bristol,United Kingdom, IEEE Computer Society. pp. 1–6. URL: https://doi.org/10.1109/CBMS.2011.5999143 , doi: .Galdran, A., Meyer, M.I., Costa, P., Mendonc¸a, A.M., Campilho, A.,2019. Uncertainty-aware artery / vein classiﬁcation on retinal images, in:16th IEEE International Symposium on Biomedical Imaging, ISBI 2019,Venice, Italy, April 8-11, 2019, IEEE. pp. 556–560. URL: https://doi.org/10.1109/ISBI.2019.8759380 , doi: .Gargeya, R., Leng, T., 2017. Automated identiﬁcation of diabetic retinopathyusing deep learning. Ophthalmology , S0161642016317742.Garway-Heath, D.F., Hitchings, R., 1998. Quantitative evaluation of the opticnerve head in early glaucoma. British Journal of Ophthalmology 82, 352–361.Ghafoorian, M., Mehrtash, A., Kapur, T., Karssemeijer, N., Marchiori,E., Pesteie, M., Guttmann, C.R.G., de Leeuw, F., Tempany, C.M.,van Ginneken, B., Fedorov, A., Abolmaesumi, P., Platel, B., III,W.M.W., 2017. Transfer learning for domain adaptation in MRI: ap-plication in brain lesion segmentation, in: Descoteaux, M., Maier-Hein, L., Franz, A.M., Jannin, P., Collins, D.L., Duchesne, S. (Eds.),Medical Image Computing and Computer Assisted Intervention - MIC-CAI 2017 - 20th International Conference, Quebec City, QC, Canada,September 11-13, 2017, Proceedings, Part III, Springer. pp. 516–524.URL: https://doi.org/10.1007/978-3-319-66179-7_59 , doi: .Giancardo, L., Meriaudeau, F., Karnowski, T.P., Li, Y., Garg, S., Tobin, K.W.,Chaum, E., 2012. Exudate-based diabetic macular edema detection in fun-dus images using publicly available datasets. Medical Image Analysis 16,216 – 226. URL: , doi: https://doi.org/10.1016/j.media.2011.07.004 .Gondal, W.M., K¨ohler, J.M., Grzeszick, R., Fink, G.A., Hirsch, M., 2017.Weakly-supervised localization of diabetic retinopathy lesions in retinalfundus images, in: 2017 IEEE International Conference on Image Process-ing, ICIP 2017, Beijing, China, September 17-20, 2017, IEEE. pp. 2069–2073. URL: https://doi.org/10.1109/ICIP.2017.8296646 , doi: .Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D.,Ozair, S., Courville, A.C., Bengio, Y., 2014. Generative adversarial nets,in: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Wein-berger, K.Q. (Eds.), Advances in Neural Information Processing Sys-tems 27: Annual Conference on Neural Information Processing Sys-tems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp. 2672–2680. URL: http://papers.nips.cc/paper/5423-generative-adversarial-nets .Govindaiah, A., Hussain, M.A., Smith, R.T., Bhuiyan, A., 2018. Deep con-volutional neural network based screening and assessment of age-relatedmacular degeneration from fundus images, in: 15th IEEE InternationalSymposium on Biomedical Imaging, ISBI 2018, Washington, DC, USA,April 4-7, 2018, IEEE. pp. 1525–1528. URL: https://doi.org/10.1109/ISBI.2018.8363863 , doi: .Grassmann, F., Mengelkamp, J., Brandl, C., Harsch, S., Weber, B.H.F., 2018. A deep learning algorithm for prediction of age-related eye disease studyseverity scale for age-related macular degeneration from color fundus pho-tography. Ophthalmology 125, 8280209.van Grinsven, M.J.J.P., van Ginneken, B., Hoyng, C.B., Theelen, T., S´anchez,C.I., 2016. Fast convolutional neural network training using selective datasampling: Application to hemorrhage detection in color fundus images.IEEE Trans. Med. Imaging 35, 1273–1284. URL: https://doi.org/10.1109/TMI.2016.2526689 , doi: .Gulshan, V., Peng, L., Coram, M., Stumpe, M.C., Wu, D., Narayanaswamy,A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., Kim, R.,Raman, R., Nelson, P.C., Mega, J.L., Webster, D.R., 2016. Develop-ment and Validation of a Deep Learning Algorithm for Detection of Di-abetic Retinopathy in Retinal Fundus Photographs. JAMA 316, 2402–2410. URL: https://doi.org/10.1001/jama.2016.17216 , doi: .Gulshan, V., Rajan, R.P., Widner, K., Wu, D., Wubbels, P., Rhodes, T., White-house, K., Coram, M., Corrado, G., Ramasamy, K., Raman, R., Peng, L.,Webster, D.R., 2019. Performance of a Deep-Learning Algorithm vs Man-ual Grading for Detecting Diabetic Retinopathy in India. JAMA Ophthal-mology 137, 987–993. URL: https://doi.org/10.1001/jamaophthalmol.2019.2004 , doi: .Guo, S., Li, T., Kang, H., Li, N., Zhang, Y., Wang, K., 2019. L-seg: An end-to-end uniﬁed framework for multi-lesion segmentation of fundus images.Neurocomputing 349, 52–63. URL: https://doi.org/10.1016/j.neucom.2019.04.019 , doi: .Guo, S., Wang, K., Kang, H., Liu, T., Gao, Y., Li, T., 2020a. Bin loss forhard exudates segmentation in fundus images. Neurocomputing 392, 314–324. URL: https://doi.org/10.1016/j.neucom.2018.10.103 ,doi: .Guo, Y., Wang, R., Zhou, X., Liu, Y., Wang, L., Lv, C., Lv, B., Xie, G., 2020b.Lesion-aware segmentation network for atrophy and detachment of patho-logical myopia on fundus images, in: 17th IEEE International Sympo-sium on Biomedical Imaging, ISBI 2020, Iowa City, IA, USA, April 3-7,2020, IEEE. pp. 1242–1245. URL: https://doi.org/10.1109/ISBI45749.2020.9098669 , doi: .Hajabdollahi, M., Esfandiarpoor, R., Najarian, K., Karimi, N., Samavi,S., Soroushmehr, S.M.R., 2018. Low complexity convolutional neu-ral network for vessel segmentation in portable retinal diagnostic de-vices, in: 2018 IEEE International Conference on Image Processing,ICIP 2018, Athens, Greece, October 7-10, 2018, IEEE. pp. 2785–2789.URL: https://doi.org/10.1109/ICIP.2018.8451665 , doi: .He, K., Gkioxari, G., Doll´ar, P., Girshick, R.B., 2017. Mask R-CNN, in: IEEEInternational Conference on Computer Vision, ICCV 2017, Venice, Italy,October 22-29, 2017, IEEE Computer Society. pp. 2980–2988. URL: https://doi.org/10.1109/ICCV.2017.322 , doi: .He, K., Sun, J., Tang, X., 2013. Guided image ﬁltering. IEEE Trans. Pat-tern Anal. Mach. Intell. 35, 1397–1409. URL: https://doi.org/10.1109/TPAMI.2012.213 , doi: .He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for imagerecognition, in: 2016 IEEE Conference on Computer Vision and PatternRecognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, IEEEComputer Society. pp. 770–778. URL: https://doi.org/10.1109/CVPR.2016.90 , doi: .He, Q., Zou, B., Zhu, C., Liu, X., Fu, H., Wang, L., 2018. Multi-labelclassiﬁcation scheme based on local regression for retinal vessel seg-mentation, in: 2018 IEEE International Conference on Image Process-ing, ICIP 2018, Athens, Greece, October 7-10, 2018, IEEE. pp. 2765–2769. URL: https://doi.org/10.1109/ICIP.2018.8451415 , doi: .He, X., Zhou, Y., Wang, B., Cui, S., Shao, L., 2019. Dme-net: Di-abetic macular edema grading by auxiliary task learning, in: Shen,D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.,Khan, A. (Eds.), Medical Image Computing and Computer Assisted In-tervention - MICCAI 2019 - 22nd International Conference, Shenzhen,China, October 13-17, 2019, Proceedings, Part I, Springer. pp. 788–796. URL: https://doi.org/10.1007/978-3-030-32239-7_87 ,doi: .Hernandez-Matas, C., Zabulis, X., Triantafyllou, A., Anyfanti, P., Douma, S.,Argyros, A., 2017. Fire: Fundus image registration dataset. Journal forModeling in Opthalmology (to appear) .5Hervella, ´A.S., Rouco, J., Novo, J., Ortega, M., 2018. Retinal imageunderstanding emerges from self-supervised multimodal reconstruction,in: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-L´opez, C.,Fichtinger, G. (Eds.), Medical Image Computing and Computer AssistedIntervention - MICCAI 2018 - 21st International Conference, Granada,Spain, September 16-20, 2018, Proceedings, Part I, Springer. pp. 321–328. URL: https://doi.org/10.1007/978-3-030-00928-1_37 ,doi: .Hoover, A.W., Kouznetsova, V., Goldbaum, M.H., 2000. Locating blood ves-sels in retinal images by piece-wise threshold probing of a matched ﬁlterresponse. IEEE Trans. Medical Imaging 19, 203–210. URL: https://doi.org/10.1109/42.845178 , doi: .Horta, A., Joshi, N., Pekala, M., Pacheco, K.D., Kong, J., Bressler, N.M.,Freund, D.E., Burlina, P., 2017. A hybrid approach for incorporating deepvisual features and side channel information with applications to AMD de-tection, in: Chen, X., Luo, B., Luo, F., Palade, V., Wani, M.A. (Eds.), 16thIEEE International Conference on Machine Learning and Applications,ICMLA 2017, Cancun, Mexico, December 18-21, 2017, IEEE. pp. 716–720. URL: https://doi.org/10.1109/ICMLA.2017.00-75 , doi: .Hu, J., Chen, Y., Zhong, J., Ju, R., Yi, Z., 2019. Automated analysis forretinopathy of prematurity by deep neural networks. IEEE Trans. Med.Imaging 38, 269–279. URL: https://doi.org/10.1109/TMI.2018.2863562 , doi: .Hu, K., Zhang, Z., Niu, X., Zhang, Y., Cao, C., Xiao, F., Gao, X., 2018.Retinal vessel segmentation of color fundus images using multiscale con-volutional neural network with an improved cross-entropy loss function.Neurocomputing 309, 179–191. URL: https://doi.org/10.1016/j.neucom.2018.05.011 , doi: .Hu, Q., Abr`amo ﬀ , M.D., Garvin, M.K., 2013. Automated sepa-ration of binary overlapping trees in low-contrast color retinal im-ages, in: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N.(Eds.), Medical Image Computing and Computer-Assisted Interven-tion - MICCAI 2013 - 16th International Conference, Nagoya, Japan,September 22-26, 2013, Proceedings, Part II, Springer. pp. 436–443.URL: https://doi.org/10.1007/978-3-642-40763-5_54 , doi: .Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q., 2017. Densely con-nected convolutional networks, in: 2017 IEEE Conference on ComputerVision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, IEEE Computer Society. pp. 2261–2269. URL: https://doi.org/10.1109/CVPR.2017.243 , doi: .Huang, Y., Lin, L., Li, M., Wu, J., Cheng, P., Wang, K., Yuan, J., Tang,X., 2020. Automated hemorrhage detection from coarsely annotated fun-dus images in diabetic retinopathy, in: 17th IEEE International Sympo-sium on Biomedical Imaging, ISBI 2020, Iowa City, IA, USA, April 3-7,2020, IEEE. pp. 1369–1372. URL: https://doi.org/10.1109/ISBI45749.2020.9098319 , doi: .Jiang, Y., Duan, L., Cheng, J., Gu, Z., Xia, H., Fu, H., Li, C., Liu, J.,2020. Jointrcnn: A region-based convolutional neural network for opticdisc and cup segmentation. IEEE Trans. Biomed. Engineering 67, 335–343. URL: https://doi.org/10.1109/TBME.2019.2913211 , doi: .Kairouz, P., McMahan, H.B., Avent, B., Bellet, A., Bennis, M., Bhagoji,A.N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., D’Oliveira,R.G.L., Rouayheb, S.E., Evans, D., Gardner, J., Garrett, Z., Gasc´on, A.,Ghazi, B., Gibbons, P.B., Gruteser, M., Harchaoui, Z., He, C., He, L.,Huo, Z., Hutchinson, B., Hsu, J., Jaggi, M., Javidi, T., Joshi, G., Kho-dak, M., Konecn´y, J., Korolova, A., Koushanfar, F., Koyejo, S., Lep-oint, T., Liu, Y., Mittal, P., Mohri, M., Nock, R., ¨Ozg¨ur, A., Pagh, R.,Raykova, M., Qi, H., Ramage, D., Raskar, R., Song, D., Song, W., Stich,S.U., Sun, Z., Suresh, A.T., Tram`er, F., Vepakomma, P., Wang, J., Xiong,L., Xu, Z., Yang, Q., Yu, F.X., Yu, H., Zhao, S., 2019. Advances andopen problems in federated learning. CoRR abs / http://arxiv.org/abs/1912.04977 .Kanse, S.S., Yadav, D.M., 2019. Retinal fundus image for glaucoma de-tection: A review and study. J. Intelligent Systems 28, 43–56. URL: https://doi.org/10.1515/jisys-2016-0258 , doi: .Kauppi, T., Kalesnykiene, V., Kamarainen, J., Lensu, L., Sorri, I., Raninen,A., Voutilainen, R., Uusitalo, H., K¨alvi¨ainen, H., Pietil¨a, J., 2007. TheDIARETDB1 diabetic retinopathy database and evaluation protocol, in: Rajpoot, N.M., Bhalerao, A.H. (Eds.), Proceedings of the British MachineVision Conference 2007, University of Warwick, UK, September 10-13,2007, British Machine Vision Association. pp. 1–10. URL: https://doi.org/10.5244/C.21.15 , doi: .Kauppi, T., Kalesnykiene, V., kristian Kamarainen, J., Lensu, L., Sorri, I.,Uusitalo, H., K¨alvi¨ainen, H., Pietil¨a, J., 2006. Diaretdb0: Evaluationdatabase and methodology for diabetic retinopathy algorithms.Keel, S., Wu, J., Lee, P.Y., Scheetz, J., He, M., 2019. Visualizing DeepLearning Models for the Detection of Referable Diabetic Retinopathyand Glaucoma. JAMA Ophthalmology 137, 288–292. URL: https://doi.org/10.1001/jamaophthalmol.2018.6035 , doi: .Khalaf, A.F., Yassine, I.A., Fahmy, A.S., 2016. Convolutional neu-ral networks for deep feature learning in retinal vessel segmentation,in: 2016 IEEE International Conference on Image Processing, ICIP2016, Phoenix, AZ, USA, September 25-28, 2016, IEEE. pp. 385–388.URL: https://doi.org/10.1109/ICIP.2016.7532384 , doi: .Kr¨ahenb¨uhl, P., Koltun, V., 2011. E ﬃ cient inference in fully connectedcrfs with gaussian edge potentials, in: Shawe-Taylor, J., Zemel, R.S.,Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (Eds.), Advances inNeural Information Processing Systems 24: 25th Annual Conference onNeural Information Processing Systems 2011. Proceedings of a meetingheld 12-14 December 2011, Granada, Spain, pp. 109–117. URL: http://papers.nips.cc/paper/4296-efficient-inference-in-fully-connected-crfs-with-gaussian-edge-potentials .Krause, J., Gulshan, V., Rahimy, E., Karth, P., Webster, D.R., 2017. Gradervariability and the importance of reference standards for evaluating ma-chine learning models for diabetic retinopathy. Ophthalmology 125.Kromm, C., Rohr, K., 2020. Inception capsule network for retinalblood vessel segmentation and centerline extraction, in: 17th IEEEInternational Symposium on Biomedical Imaging, ISBI 2020, IowaCity, IA, USA, April 3-7, 2020, IEEE. pp. 1223–1226. URL: https://doi.org/10.1109/ISBI45749.2020.9098538 , doi: .de La Torre, J., Valls, A., Puig, D., 2020. A deep learning interpretable clas-siﬁer for diabetic retinopathy disease grading. Neurocomputing 396, 465–476. URL: https://doi.org/10.1016/j.neucom.2018.07.102 ,doi: .Li, C., Ye, J., He, J., Wang, S., Qiao, Y., Gu, L., 2020a. Dense correlation net-work for automated multi-label ocular disease detection with paired colorfundus photographs, in: 17th IEEE International Symposium on Biomedi-cal Imaging, ISBI 2020, Iowa City, IA, USA, April 3-7, 2020, IEEE. pp. 1–4. URL: https://doi.org/10.1109/ISBI45749.2020.9098340 ,doi: .Li, L., Xu, M., Liu, H., Li, Y., Wang, X., Jiang, L., Wang, Z., Fan,X., Wang, N., 2020b. A large-scale database and a CNN model forattention-based glaucoma detection. IEEE Trans. Med. Imaging 39, 413–424. URL: https://doi.org/10.1109/TMI.2019.2927226 , doi: .Li, L., Xu, M., Wang, X., Jiang, L., Liu, H., 2019a. Attention basedglaucoma detection: A large-scale database and CNN model,in: IEEE Conference on Computer Vision and Pattern Recog-nition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019,Computer Vision Foundation / IEEE. pp. 10571–10580. URL: http://openaccess.thecvf.com/content_CVPR_2019/html/Li_Attention_Based_Glaucoma_Detection_A_Large-Scale_Database_and_CNN_Model_CVPR_2019_paper.html , doi: .Li, T., Gao, Y., Wang, K., Guo, S., Liu, H., Kang, H., 2019b. Diagnosticassessment of deep learning algorithms for diabetic retinopathy screening.Inf. Sci. 501, 511–522. URL: https://doi.org/10.1016/j.ins.2019.06.011 , doi: .Li, X., Hu, X., Yu, L., Zhu, L., Fu, C., Heng, P., 2020c. Canet:Cross-disease attention network for joint diabetic retinopathy and dia-betic macular edema grading. IEEE Trans. Med. Imaging 39, 1483–1493. URL: https://doi.org/10.1109/TMI.2019.2951844 , doi: .Li, Z., He, Y., Keel, S., Meng, W., Chang, R.T., He, M., 2018a. E ﬃ cacy of adeep learning system for detecting glaucomatous optic neuropathy basedon color fundus photographs. Ophthalmology , S0161642017335650.Li, Z., Keel, S., Liu, C., He, Y., Meng, W., Scheetz, J., Lee, P.Y., Shaw,6 J., Ting, D., Wong, T., Taylor, H., Chang, R., He, M., 2018b. An au-tomated grading system for detection of vision-threatening referable di-abetic retinopathy on the basis of color fundus photographs. DiabetesCare URL: https://care.diabetesjournals.org/content/early/2018/09/27/dc18-0147 , doi: .Liao, W., Zou, B., Zhao, R., Chen, Y., He, Z., Zhou, M., 2020. Clinical inter-pretable deep learning model for glaucoma diagnosis. IEEE J. Biomed.Health Informatics 24, 1405–1412. URL: https://doi.org/10.1109/JBHI.2019.2949075 , doi: .Lim, G., Cheng, Y., Hsu, W., Lee, M., 2015. Integrated optic disc andcup segmentation with deep learning, in: 27th IEEE International Con-ference on Tools with Artiﬁcial Intelligence, ICTAI 2015, Vietri sulMare, Italy, November 9-11, 2015, IEEE Computer Society. pp. 162–169. URL: https://doi.org/10.1109/ICTAI.2015.36 , doi: .Lim, G., Lim, Z.W., Xu, D., Ting, D.S.W., Wong, T.Y., Lee, M., Hsu, W.,2019. Feature isolation for hypothesis testing in retinal imaging: An is-chemic stroke prediction case study, in: The Thirty-Third AAAI Confer-ence on Artiﬁcial Intelligence, AAAI 2019, The Thirty-First InnovativeApplications of Artiﬁcial Intelligence Conference, IAAI 2019, The NinthAAAI Symposium on Educational Advances in Artiﬁcial Intelligence,EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019,AAAI Press. pp. 9510–9515. URL: https://doi.org/10.1609/aaai.v33i01.33019510 , doi: .Lin, T., Goyal, P., Girshick, R.B., He, K., Doll´ar, P., 2017. Focal loss for denseobject detection, in: IEEE International Conference on Computer Vision,ICCV 2017, Venice, Italy, October 22-29, 2017, IEEE Computer Society.pp. 2999–3007. URL: https://doi.org/10.1109/ICCV.2017.324 ,doi: .Lin, Z., Guo, R., Wang, Y., Wu, B., Chen, T., Wang, W., Chen, D.Z., Wu,J., 2018. A framework for identifying diabetic retinopathy based on anti-noise detection and attention-based fusion, in: Frangi, A.F., Schnabel, J.A.,Davatzikos, C., Alberola-L´opez, C., Fichtinger, G. (Eds.), Medical ImageComputing and Computer Assisted Intervention - MICCAI 2018 - 21st In-ternational Conference, Granada, Spain, September 16-20, 2018, Proceed-ings, Part II, Springer. pp. 74–82. URL: https://doi.org/10.1007/978-3-030-00934-2_9 , doi: .Liskowski, P., Krawiec, K., 2016. Segmenting retinal blood vessels withdeep neural networks. IEEE Trans. Med. Imaging 35, 2369–2380.URL: https://doi.org/10.1109/TMI.2016.2546227 , doi: .Liu, B., Gu, L., Lu, F., 2019a. Unsupervised ensemble strategy forretinal vessel segmentation, in: Shen, D., Liu, T., Peters, T.M.,Staib, L.H., Essert, C., Zhou, S., Yap, P., Khan, A. (Eds.), Med-ical Image Computing and Computer Assisted Intervention - MIC-CAI 2019 - 22nd International Conference, Shenzhen, China, Oc-tober 13-17, 2019, Proceedings, Part I, Springer. pp. 111–119.URL: https://doi.org/10.1007/978-3-030-32239-7_13 , doi: .Liu, C., Wang, W., Li, Z., Jiang, Y., Han, X., Ha, J., Meng, W., He, M.,2019b. Biological age estimated from retinal imaging: A novel biomarkerof aging, in: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou,S., Yap, P., Khan, A. (Eds.), Medical Image Computing and ComputerAssisted Intervention - MICCAI 2019 - 22nd International Conference,Shenzhen, China, October 13-17, 2019, Proceedings, Part I, Springer. pp.138–146. URL: https://doi.org/10.1007/978-3-030-32239-7_16 , doi: .Liu, H., Li, L., Wormstone, I.M., Qiao, C., Zhang, C., Liu, P., Li, S., Wang,H., Mou, D., Pang, R., Yang, D., Zangwill, L.M., Moghimi, S., Hou,H., Bowd, C., Jiang, L., Chen, Y., Hu, M., Xu, Y., Kang, H., Ji, X.,Chang, R., Tham, C., Cheung, C., Ting, D.S.W., Wong, T.Y., Wang,Z., Weinreb, R.N., Xu, M., Wang, N., 2019c. Development and Vali-dation of a Deep Learning System to Detect Glaucomatous Optic Neu-ropathy Using Fundus Photographs. JAMA Ophthalmology 137, 1353–1360. URL: https://doi.org/10.1001/jamaophthalmol.2019.3501 , doi: .Liu, P., Kong, B., Li, Z., Zhang, S., Fang, R., 2019d. CFEA: col-laborative feature ensembling adaptation for domain adaptation in un-supervised optic disc and cup segmentation, in: Shen, D., Liu, T.,Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P., Khan, A.(Eds.), Medical Image Computing and Computer Assisted Intervention- MICCAI 2019 - 22nd International Conference, Shenzhen, China, October 13-17, 2019, Proceedings, Part V, Springer. pp. 521–529.URL: https://doi.org/10.1007/978-3-030-32254-0_58 , doi: .Liu, Q., Hong, X., Li, S., Chen, Z., Zhao, G., Zou, B., 2019e. A spatial-aware joint optic disc and cup segmentation method. Neurocomputing359, 285–297. URL: https://doi.org/10.1016/j.neucom.2019.05.039 , doi: .Liu, Y., Cheng, M., Hu, X., Wang, K., Bai, X., 2017. Richer convolutionalfeatures for edge detection, in: 2017 IEEE Conference on Computer Visionand Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26,2017, IEEE Computer Society. pp. 5872–5881. URL: https://doi.org/10.1109/CVPR.2017.622 , doi: .Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional networks forsemantic segmentation, in: IEEE Conference on Computer Vision andPattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015,IEEE Computer Society. pp. 3431–3440. URL: https://doi.org/10.1109/CVPR.2015.7298965 , doi: .Lowell, J., Hunter, A., Steel, D., Basu, A., Ryder, R., Fletcher, E., Kennedy,L., 2004. Optic nerve head segmentation. IEEE Trans. Medical Imaging23, 256–264. URL: https://doi.org/10.1109/TMI.2003.823261 ,doi: .Luo, W., Li, Y., Urtasun, R., Zemel, R.S., 2016. Understanding thee ﬀ ective receptive ﬁeld in deep convolutional neural networks, in:Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R.(Eds.), Advances in Neural Information Processing Systems 29: AnnualConference on Neural Information Processing Systems 2016, Decem-ber 5-10, 2016, Barcelona, Spain, pp. 4898–4906. URL: http://papers.nips.cc/paper/6203-understanding-the-effective-receptive-field-in-deep-convolutional-neural-networks .Ma, W., Yu, S., Ma, K., Wang, J., Ding, X., Zheng, Y., 2019. Multi-task neural networks with spatial activation for retinal vessel seg-mentation and artery / vein classiﬁcation, in: Shen, D., Liu, T., Pe-ters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P., Khan, A.(Eds.), Medical Image Computing and Computer Assisted Intervention- MICCAI 2019 - 22nd International Conference, Shenzhen, China,October 13-17, 2019, Proceedings, Part I, Springer. pp. 769–778.URL: https://doi.org/10.1007/978-3-030-32239-7_85 , doi: .Mahapatra, D., Bozorgtabar, B., Hewavitharanage, S., Garnavi, R., 2017.Image super resolution using generative adversarial networks and lo-cal saliency maps for retinal image analysis, in: Descoteaux, M.,Maier-Hein, L., Franz, A.M., Jannin, P., Collins, D.L., Duchesne, S.(Eds.), Medical Image Computing and Computer Assisted Intervention- MICCAI 2017 - 20th International Conference, Quebec City, QC,Canada, September 11-13, 2017, Proceedings, Part III, Springer. pp. 382–390. URL: https://doi.org/10.1007/978-3-319-66179-7_44 ,doi: .Maninis, K., Pont-Tuset, J., Arbel´aez, P.A., Gool, L.V., 2016. Deep reti-nal image understanding, in: Ourselin, S., Joskowicz, L., Sabuncu, M.R.,¨Unal, G.B., Wells, W. (Eds.), Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016 - 19th International Conference,Athens, Greece, October 17-21, 2016, Proceedings, Part II, pp. 140–148. URL: https://doi.org/10.1007/978-3-319-46723-8_17 ,doi: .Massin, P., Chabouis, A., Erginay, A., Viens-Bitker, C., Lecleire-Collet,A., Meas, T., Guillausseau, P.J., Choupot, G., Andr´e, B., Denormandie,P., 2008. Ophdiat: A telemedical network screening system for dia-betic retinopathy in the ile-de-france. Diabetes & Metabolism 34, 227 –234. URL: , doi: https://doi.org/10.1016/j.diabet.2007.12.006 .Mathis Antony, d., 2015. Team o o solution summary. .Meng, Q., Hashimoto, Y., Satoh, S., 2020. How to extract more informationwith less burden: Fundus image classiﬁcation and retinal disease localiza-tion with ophthalmologist intervention, in: 17th IEEE International Sym-posium on Biomedical Imaging, ISBI 2020, Iowa City, IA, USA, April 3-7,2020, IEEE. pp. 1373–1377. URL: https://doi.org/10.1109/ISBI45749.2020.9098600 , doi: .Meyer, M.I., Galdran, A., Mendonc¸a, A.M., Campilho, A., 2018. A pixel-wise distance regression approach for joint retinal optical disc and fovea7detection, in: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-L´opez, C., Fichtinger, G. (Eds.), Medical Image Computing and ComputerAssisted Intervention - MICCAI 2018 - 21st International Conference,Granada, Spain, September 16-20, 2018, Proceedings, Part II, Springer. pp.39–47. URL: https://doi.org/10.1007/978-3-030-00934-2_5 ,doi: .Mishra, S., Chen, D.Z., Hu, X.S., 2020. A data-aware deep supervisedmethod for retinal vessel segmentation, in: 17th IEEE International Sym-posium on Biomedical Imaging, ISBI 2020, Iowa City, IA, USA, April 3-7,2020, IEEE. pp. 1254–1257. URL: https://doi.org/10.1109/ISBI45749.2020.9098403 , doi: .Mo, J., Zhang, L., Feng, Y., 2018. Exudate-based diabetic macular edemarecognition in retinal images using cascaded deep residual networks. Neu-rocomputing 290, 161–171. URL: https://doi.org/10.1016/j.neucom.2018.02.035 , doi: .Moccia, S., Momi, E.D., Hadji, S.E., Mattos, L.S., 2018. Bloodvessel segmentation algorithms - review of methods, datasets andevaluation metrics. Comput. Methods Programs Biomed. 158, 71–91. URL: https://doi.org/10.1016/j.cmpb.2018.02.001 , doi: .Mohan, D., Kumar, J.R.H., Seelamantula, C.S., 2018. High-performanceoptic disc segmentation using convolutional neural networks, in: 2018IEEE International Conference on Image Processing, ICIP 2018, Athens,Greece, October 7-10, 2018, IEEE. pp. 4038–4042. URL: https://doi.org/10.1109/ICIP.2018.8451543 , doi: .Mohan, D., Kumar, J.R.H., Seelamantula, C.S., 2019. Optic disc seg-mentation using cascaded multiresolution convolutional neural networks,in: 2019 IEEE International Conference on Image Processing, ICIP2019, Taipei, Taiwan, September 22-25, 2019, IEEE. pp. 834–838.URL: https://doi.org/10.1109/ICIP.2019.8804267 , doi: .Nasery, V., Soundararajan, K.B., Galeotti, J.M., 2020. Learning to seg-ment vessels from poorly illuminated fundus images, in: 17th IEEEInternational Symposium on Biomedical Imaging, ISBI 2020, IowaCity, IA, USA, April 3-7, 2020, IEEE. pp. 1232–1236. URL: https://doi.org/10.1109/ISBI45749.2020.9098694 , doi: .Natarajan, S., Jain, A., Krishnan, R., Rogye, A., Sivaprasad, S., 2019. Di-agnostic Accuracy of Community-Based Diabetic Retinopathy Screen-ing With an O ﬄ ine Artiﬁcial Intelligence System on a Smartphone.JAMA Ophthalmology 137, 1182–1188. URL: https://doi.org/10.1001/jamaophthalmol.2019.2923 , doi: .Nguyen, H.V., Tan, G.S.W., Tapp, R.J., Mital, S., Ting, D.S.W., Wong, H.T.,Tan, C.S., Laude, A., Tai, E.S., Tan, N.C.a., 2016. Cost-e ﬀ ectiveness of anational telemedicine diabetic retinopathy screening program in singapore.Ophthalmology , 2571–2580.Niemeijer, M., van Ginneken, B., Cree, M.J., Mizutani, A., Quellec, G.,S´anchez, C.I., Zhang, B., Hornero, R., Lamard, M., Muramatsu, C., Wu,X., Cazuguel, G., You, J., Mayo, A., Li, Q., Hatanaka, Y., Cochener,B., Roux, C., Karray, F., Garc´ıa, M., Fujita, H., Abr`amo ﬀ , M.D., 2010.Retinopathy online challenge: Automatic detection of microaneurysms indigital color fundus photographs. IEEE Trans. Medical Imaging 29, 185–195. URL: https://doi.org/10.1109/TMI.2009.2033909 , doi: .Niemeijer, M., Xu, X., Dumitrescu, A.V., Gupta, P., van Ginneken, B., Folk,J.C., Abr`amo ﬀ , M.D., 2011. Automated measurement of the arteriolar-to-venular width ratio in digital color fundus photographs. IEEE Trans.Med. Imaging 30, 1941–1950. URL: https://doi.org/10.1109/TMI.2011.2159619 , doi: .Niu, Y., Gu, L., Lu, F., Lv, F., Wang, Z., Sato, I., Zhang, Z., Xiao, Y., Dai,X., Cheng, T., 2019. Pathological evidence exploration in deep retinalimage diagnosis, in: The Thirty-Third AAAI Conference on Artiﬁcial In-telligence, AAAI 2019, The Thirty-First Innovative Applications of Arti-ﬁcial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposiumon Educational Advances in Artiﬁcial Intelligence, EAAI 2019, Honolulu,Hawaii, USA, January 27 - February 1, 2019, AAAI Press. pp. 1093–1101. URL: https://doi.org/10.1609/aaai.v33i01.33011093 ,doi: .Oliveira, A., Pereira, S., Silva, C.A., 2018. Retinal vessel segmentation basedon fully convolutional neural networks. Expert Syst. Appl. 112, 229– 242. URL: https://doi.org/10.1016/j.eswa.2018.06.034 , doi: .Orlando, J.I., Breda, J.B., van Keer, K., Blaschko, M.B., Blanco, P.J., Bu-lant, C.A., 2018. Towards a glaucoma risk index based on simulatedhemodynamics from fundus images, in: Frangi, A.F., Schnabel, J.A., Da-vatzikos, C., Alberola-L´opez, C., Fichtinger, G. (Eds.), Medical ImageComputing and Computer Assisted Intervention - MICCAI 2018 - 21st In-ternational Conference, Granada, Spain, September 16-20, 2018, Proceed-ings, Part II, Springer. pp. 65–73. URL: https://doi.org/10.1007/978-3-030-00934-2_8 , doi: .Orlando, J.I., Fu, H., Breda, J.B., van Keer, K., Bathula, D.R., Diaz-Pinto, A.,Fang, R., Heng, P., Kim, J., Lee, J., Lee, J., Li, X., Liu, P., Lu, S., Muruge-san, B., Naranjo, V., Phaye, S.S.R., Shankaranarayana, S.M., Bogunovic,H., 2020. REFUGE challenge: A uniﬁed framework for evaluating auto-mated methods for glaucoma assessment from fundus photographs. Med-ical Image Anal. 59. URL: https://doi.org/10.1016/j.media.2019.101570 , doi: .Owen, C.G., Rudnicka, A.R., Mullen, R.J., Barman, S., Monekosso, D.,Whincup, P.H., Ng, J., Paterson, C., 2009. Measuring retinal vessel tor-tuosity in 10-year-old children: validation of the computer-assisted imageanalysis of the retina (caiar) program. Investigative Ophthalmology & Vi-sual Science 50, 2004–2010.Pal, A., Moorthy, M.R., Shahina, A., 2018. G-eyenet: A convolutional au-toencoding classiﬁer framework for the detection of glaucoma from reti-nal fundus images, in: 2018 IEEE International Conference on Image Pro-cessing, ICIP 2018, Athens, Greece, October 7-10, 2018, IEEE. pp. 2775–2779. URL: https://doi.org/10.1109/ICIP.2018.8451029 , doi: .Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J., 2017. Large kernel matters -improve semantic segmentation by global convolutional network, in: 2017IEEE Conference on Computer Vision and Pattern Recognition, CVPR2017, Honolulu, HI, USA, July 21-26, 2017, IEEE Computer Society.pp. 1743–1751. URL: https://doi.org/10.1109/CVPR.2017.189 ,doi: .Peng, Y., Dharssi, S., Chen, Q., Keenan, T.D., Agr´on, E., Wong, W.T., Chew,E.Y., Lu, Z., 2018. Deepseenet: A deep learning model for automatedclassiﬁcation of patient-based age-related macular degeneration severityfrom color fundus photographs. Ophthalmology .Phene, S., Dunn, R.C., Hammel, N., Liu, Y., Krause, J., Kitade, N., Schaek-ermann, M., Sayres, R., Wu, D.J., Bora, A., Semturs, C., Misra, A.,Huang, A.E., Spitze, A., Medeiros, F.A., Maa, A.Y., Gandhi, M., Cor-rado, G.S., Peng, L., Webster, D.R., 2019. Deep learning and glau-coma specialists: The relative importance of optic disc features to predictglaucoma referral in fundus photographs. Ophthalmology 126, 1627 –1639. URL: , doi: https://doi.org/10.1016/j.ophtha.2019.07.024 .Playout, C., Duval, R., Cheriet, F., 2018. A multitask learning architecturefor simultaneous segmentation of bright and red lesions in fundus im-ages, in: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-L´opez, C.,Fichtinger, G. (Eds.), Medical Image Computing and Computer AssistedIntervention - MICCAI 2018 - 21st International Conference, Granada,Spain, September 16-20, 2018, Proceedings, Part II, Springer. pp. 101–108. URL: https://doi.org/10.1007/978-3-030-00934-2_12 ,doi: .Playout, C., Duval, R., Cheriet, F., 2019. A novel weakly supervised multi-task architecture for retinal lesions segmentation on fundus images. IEEETrans. Med. Imaging 38, 2434–2444. URL: https://doi.org/10.1109/TMI.2019.2906319 , doi: .Pohlen, T., Hermans, A., Mathias, M., Leibe, B., 2017. Full-resolutionresidual networks for semantic segmentation in street scenes, in: 2017IEEE Conference on Computer Vision and Pattern Recognition, CVPR2017, Honolulu, HI, USA, July 21-26, 2017, IEEE Computer Society.pp. 3309–3318. URL: https://doi.org/10.1109/CVPR.2017.353 ,doi: .Poplin, R., Varadarajan, A.V., Blumer, K., Liu, Y., McConnell, M., Corrado,G., Peng, L., Webster, D.R., 2018. Prediction of cardiovascular risk factorsfrom retinal fundus photographs via deep learning. Nature BiomedicalEngineering 2, 158–164.Porwal, P., Pachade, S., Kamble, R., Kokare, M., Deshmukh, G., Sa-hasrabuddhe, V., M´eriaudeau, F., 2018. Indian diabetic retinopathy im-age dataset (idrid): A database for diabetic retinopathy screening research.8 Data 3, 25. URL: https://doi.org/10.3390/data3030025 , doi: .Porwal, P., Pachade, S., Kokare, M., Deshmukh, G., Son, J., Bae, W., Liu,L., Wang, J., Liu, X., Gao, L., Wu, T., Xiao, J., Wang, F., Yin, B., Wang,Y., Danala, G., He, L., Choi, Y.H., M´eriaudeau, F., 2020. Idrid: Diabeticretinopathy - segmentation and grading challenge. Medical Image Anal.59. URL: https://doi.org/10.1016/j.media.2019.101561 , doi: .for Retinopathy of Prematurity Cooperative Group, T.C., 1988. Multicentertrial of cryotherapy for retinopathy of prematurity: Preliminary results.Archives of ophthalmology 106, 471–479.Quellec, G., Charri`ere, K., Boudi, Y., Cochener, B., Lamard, M., 2017. Deepimage mining for diabetic retinopathy screening. Medical Image Anal.39, 178–193. URL: https://doi.org/10.1016/j.media.2017.04.012 , doi: .Quellec, G., Lamard, M., Conze, P., Massin, P., Cochener, B., 2020. Au-tomatic detection of rare pathologies in fundus photographs using few-shot learning. Medical Image Anal. 61, 101660. URL: https://doi.org/10.1016/j.media.2020.101660 , doi: .Raghavendra, U., Fujita, H., Bhandary, S.V., Gudigar, A., Tan, J.H., Acharya,U.R., 2018. Deep convolution neural network for accurate diagno-sis of glaucoma using digital fundus images. Inf. Sci. 441, 41–49.URL: https://doi.org/10.1016/j.ins.2018.01.051 , doi: .Rahimy, Ehsan, 2018. Deep learning applications in ophthalmology. CurrentOpinion in Ophthalmology , 1.Raj, P.K., Manjunath, A., Kumar, J.R.H., Seelamantula, C.S., 2020. Au-tomatic classiﬁcation of artery / vein from single wavelength fundus im-ages, in: 17th IEEE International Symposium on Biomedical Imaging,ISBI 2020, Iowa City, IA, USA, April 3-7, 2020, IEEE. pp. 1262–1265.URL: https://doi.org/10.1109/ISBI45749.2020.9098580 , doi: .Redmon, J., Farhadi, A., 2018. Yolov3: An incremental improvement. CoRRabs / http://arxiv.org/abs/1804.02767 .Ren, S., He, K., Girshick, R.B., Sun, J., 2015. Faster R-CNN: towardsreal-time object detection with region proposal networks, in: Cortes,C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (Eds.),Advances in Neural Information Processing Systems 28: Annual Con-ference on Neural Information Processing Systems 2015, December7-12, 2015, Montreal, Quebec, Canada, pp. 91–99. URL: http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks .Robinson, B., 2003. Prevalence of asymptomatic eye disease pr´evalence desmaladies oculaires asymptomatiques. Revue Canadienne D’Optom´etrie ,175–180.Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networksfor biomedical image segmentation, in: Navab, N., Hornegger, J., III,W.M.W., Frangi, A.F. (Eds.), Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Mu-nich, Germany, October 5 - 9, 2015, Proceedings, Part III, Springer. pp.234–241. URL: https://doi.org/10.1007/978-3-319-24574-4_28 , doi: .Roy, P.K., Tennakoon, R.B., Cao, K., Sedai, S., Mahapatra, D., Maetschke,S., Garnavi, R., 2017. A novel hybrid approach for severity assessmentof diabetic retinopathy in colour fundus images, in: 14th IEEE Interna-tional Symposium on Biomedical Imaging, ISBI 2017, Melbourne, Aus-tralia, April 18-21, 2017, IEEE. pp. 1078–1082. URL: https://doi.org/10.1109/ISBI.2017.7950703 , doi: .Sabour, S., Frosst, N., Hinton, G.E., 2017. Dynamic routing between cap-sules, in: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus,R., Vishwanathan, S.V.N., Garnett, R. (Eds.), Advances in Neural Infor-mation Processing Systems 30: Annual Conference on Neural InformationProcessing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp.3856–3866. URL: http://papers.nips.cc/paper/6975-dynamic-routing-between-capsules .Salamat, N., Missen, M.M.S., Rashid, A., 2019. Diabetic retinopathy tech-niques in retinal images: A review. Artif. Intell. Medicine 97, 168–188. URL: https://doi.org/10.1016/j.artmed.2018.10.009 ,doi: .Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L., 2018. In- verted residuals and linear bottlenecks: Mobile networks for classiﬁca-tion, detection and segmentation. CoRR abs / http://arxiv.org/abs/1801.04381 .dos Santos Ferreira, M.V., de Carvalho Filho, A.O., de Sousa, A.D., Silva,A.C., Gattass, M., 2018. Convolutional neural network and texturedescriptor-based automatic detection and diagnosis of glaucoma. ExpertSyst. Appl. 110, 250–263. URL: https://doi.org/10.1016/j.eswa.2018.06.010 , doi: .Sarhan, M.H., Albarqouni, S., Yigitsoy, M., Navab, N., Eslami, A., 2019.Multi-scale microaneurysms segmentation using embedding triplet loss,in: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S.,Yap, P., Khan, A. (Eds.), Medical Image Computing and Computer As-sisted Intervention - MICCAI 2019 - 22nd International Conference, Shen-zhen, China, October 13-17, 2019, Proceedings, Part I, Springer. pp. 174–182. URL: https://doi.org/10.1007/978-3-030-32239-7_20 ,doi: .Sayres, R., Taly, A., Rahimy, E., Blumer, K., Coz, D., Hammel, N., Krause,J., Narayanaswamy, A., Rastegar, Z., Wu, D., Xu, S., Barb, S., Joseph,A., Shumski, M., Smith, J., Sood, A.B., Corrado, G.S., Peng, L., Webster,D.R., 2019. Using a deep learning algorithm and integrated gradients ex-planation to assist grading for diabetic retinopathy. Ophthalmology 126,552 – 564. URL: , doi: https://doi.org/10.1016/j.ophtha.2018.11.016 .Schmidt-Erfurth, U., Sadeghipour, A., Gerendas, B.S., Waldstein, S.M., Bo-gunovi´c, H., 2018. Artiﬁcial intelligence in retina. Progress in Reti-nal and Eye Research 67, 1 – 29. URL: , doi: https://doi.org/10.1016/j.preteyeres.2018.07.004 .Sedai, S., Mahapatra, D., Hewavitharanage, S., Maetschke, S., Garnavi,R., 2017a. Semi-supervised segmentation of optic cup in retinal fundusimages using variational autoencoder, in: Descoteaux, M., Maier-Hein,L., Franz, A.M., Jannin, P., Collins, D.L., Duchesne, S. (Eds.), Med-ical Image Computing and Computer Assisted Intervention - MICCAI2017 - 20th International Conference, Quebec City, QC, Canada, Septem-ber 11-13, 2017, Proceedings, Part II, Springer. pp. 75–82. URL: https://doi.org/10.1007/978-3-319-66185-8_9 , doi: .Sedai, S., Tennakoon, R.B., Roy, P.K., Cao, K., Garnavi, R., 2017b. Multi-stage segmentation of the fovea in retinal fundus images using fullyconvolutional neural networks, in: 14th IEEE International Symposiumon Biomedical Imaging, ISBI 2017, Melbourne, Australia, April 18-21,2017, IEEE. pp. 1083–1086. URL: https://doi.org/10.1109/ISBI.2017.7950704 , doi: .Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.,2017. Grad-cam: Visual explanations from deep networks via gradient-based localization, in: IEEE International Conference on Computer Vi-sion, ICCV 2017, Venice, Italy, October 22-29, 2017, IEEE Computer So-ciety. pp. 618–626. URL: https://doi.org/10.1109/ICCV.2017.74 , doi: .Sengupta, S., Singh, A., Leopold, H.A., Gulati, T., Lakshminarayanan,V., 2020. Ophthalmic diagnosis using deep learning with fundusimages - A critical review. Artif. Intell. Medicine 102, 101758.URL: https://doi.org/10.1016/j.artmed.2019.101758 , doi: .Shah, S., Kasukurthi, N., Pande, H., 2019. Dynamic region proposal net-works for semantic segmentation in automated glaucoma screening, in:16th IEEE International Symposium on Biomedical Imaging, ISBI 2019,Venice, Italy, April 8-11, 2019, IEEE. pp. 578–582. URL: https://doi.org/10.1109/ISBI.2019.8759171 , doi: .Shankaranarayana, S.M., Ram, K., Mitra, K., Sivaprakasam, M., 2019. Fullyconvolutional networks for monocular retinal depth estimation and opticdisc-cup segmentation. IEEE J. Biomed. Health Informatics 23, 1417–1426. URL: https://doi.org/10.1109/JBHI.2019.2899403 , doi: .Shen, Y., Sheng, B., Fang, R., Li, H., Dai, L., Stolte, S., Qin, J., Jia, W., Shen,D., 2020. Domain-invariant interpretable fundus image quality assessment.Medical Image Anal. 61, 101654. URL: https://doi.org/10.1016/j.media.2020.101654 , doi: .Silva, D.A.D., Liew, G., Wong, M.C., Chang, H.M., Wong, T.Y., 2009. Reti-nal vascular caliber and extracranial carotid disease in patients with acute9ischemic stroke: the multi-centre retinal stroke (mcrs) study. Stroke; ajournal of cerebral circulation 40, 3695–9.Simonyan, K., Zisserman, A., 2015. Very deep convolutional networksfor large-scale image recognition, in: Bengio, Y., LeCun, Y. (Eds.), 3rdInternational Conference on Learning Representations, ICLR 2015, SanDiego, CA, USA, May 7-9, 2015, Conference Track Proceedings. URL: http://arxiv.org/abs/1409.1556 .Sivaprasad, S., Arden, G., Prevost, A.T., Crosby-Nwaobi, R., Holmes, H.,Kelly, J., Murphy, C., Rubin, G., Vasconcelos, J., Hykin, P., 2014. A mul-ticentre phase iii randomised controlled single-masked clinical trial eval-uating the clinical e ﬃ cacy and safety of light-masks at preventing dark-adaptation in the treatment of early diabetic macular oedema (cleopatra):study protocol for a randomised controlled trial. Trials .Sivaswamy, J., Krishnadas, S.R., Joshi, G.D., Jain, M., Tabish, A.U.S., 2014.Drishti-gs: Retinal image dataset for optic nerve head(onh) segmenta-tion, in: IEEE 11th International Symposium on Biomedical Imaging,ISBI 2014, April 29 - May 2, 2014, Beijing, Chin, Beijing, China, IEEE.pp. 53–56. URL: https://doi.org/10.1109/ISBI.2014.6867807 ,doi: .Soomro, T.A., Aﬁﬁ, A.J., Gao, J., Hellwich, O., Zheng, L., Paul, M.,2019. Strided fully convolutional neural network for boosting the sen-sitivity of retinal blood vessels segmentation. Expert Syst. Appl. 134, 36–52. URL: https://doi.org/10.1016/j.eswa.2019.05.029 , doi: .Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A., 2015. Striv-ing for simplicity: The all convolutional net, in: Bengio, Y., LeCun, Y.(Eds.), 3rd International Conference on Learning Representations, ICLR2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceed-ings. URL: http://arxiv.org/abs/1412.6806 .Staal, J., Abr`amo ﬀ , M.D., Niemeijer, M., Viergever, M.A., van Ginneken,B., 2004. Ridge-based vessel segmentation in color images of the retina.IEEE Trans. Medical Imaging 23, 501–509. URL: https://doi.org/10.1109/TMI.2004.825627 , doi: .Szegedy, C., Io ﬀ e, S., Vanhoucke, V., Alemi, A.A., 2017. Inception-v4,inception-resnet and the impact of residual connections on learning, in:Singh, S.P., Markovitch, S. (Eds.), Proceedings of the Thirty-First AAAIConference on Artiﬁcial Intelligence, February 4-9, 2017, San Francisco,California, USA, AAAI Press. pp. 4278–4284. URL: http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14806 .Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan,D., Vanhoucke, V., Rabinovich, A., 2015. Going deeper with convolutions,in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR2015, Boston, MA, USA, June 7-12, 2015, IEEE Computer Society. pp.1–9. URL: https://doi.org/10.1109/CVPR.2015.7298594 , doi: .Szegedy, C., Vanhoucke, V., Io ﬀ e, S., Shlens, J., Wojna, Z., 2016. Re-thinking the inception architecture for computer vision, in: 2016 IEEEConference on Computer Vision and Pattern Recognition, CVPR 2016,Las Vegas, NV, USA, June 27-30, 2016, IEEE Computer Society.pp. 2818–2826. URL: https://doi.org/10.1109/CVPR.2016.308 ,doi: .Tan, J.H., Bhandary, S.V., Sivaprasad, S., Hagiwara, Y., Bagchi, A.,Raghavendra, U., Rao, A.K., Raju, B., Shetty, N.S., Gertych, A., Chua,K.C., Acharya, U.R., 2018. Age-related macular degeneration detectionusing deep convolutional neural network. Future Gener. Comput. Syst.87, 127–135. URL: https://doi.org/10.1016/j.future.2018.05.001 , doi: .Tan, J.H., Fujita, H., Sivaprasad, S., Bhandary, S.V., Rao, A.K., Chua, K.C.,Acharya, U.R., 2017. Automated segmentation of exudates, haemor-rhages, microaneurysms using single convolutional neural network. Inf.Sci. 420, 66–76. URL: https://doi.org/10.1016/j.ins.2017.08.050 , doi: .Tasman, W., Patz, A., Mcnamara, J.A., Kaiser, R.S., Trese, M.T., Smith, B.T.,2006. Retinopathy of prematurity: The life of a lifetime disease. AmericanJournal of Ophthalmology 141, 0–174.Taylor, S., Brown, J.M., Gupta, K., Campbell, J.P., Ostmo, S., Chan, R.V.P.,Dy, J., Erdogmus, D., Ioannidis, S., Kim, S.J., Kalpathy-Cramer, J., Chi-ang, M.F., for the Imaging, in Retinopathy of Prematurity Consortium, I.,2019. Monitoring Disease Progression With a Quantitative Severity Scalefor Retinopathy of Prematurity Using Deep Learning. JAMA Ophthalmol-ogy 137, 1022–1028. URL: https://doi.org/10.1001/jamaophthalmol.2019.2433 , doi: . Tham, Y.C., Li, X., Wong, T.Y., Quigley, H.A., Aung, T., Cheng, C.,2014. Global prevalence of glaucoma and projections of glaucoma bur-den through 2040 a systematic review and meta-analysis. Ophthalmology121, 2081–2090.The Age-Related Eye Disease Study Research Group, 1999. The age-related eye disease study (areds): Design implications areds report no.1. Controlled Clinical Trials 20, 573 – 600. URL: , doi: https://doi.org/10.1016/S0197-2456(99)00031-8 .Ting, D., Pasquale, L., Peng, L., Campbell, J., Lee, A., Raman, R., Tan, G.,Schmetterer, L., Keane, P., Wong, T., 2018a. Artiﬁcial intelligence anddeep learning in ophthalmology. British Journal of Ophthalmology 103,bjophthalmol–2018. doi: .Ting, D.S., Peng, L., Varadarajan, A.V., Keane, P.A., Burlina, P.M., Chi-ang, M.F., Schmetterer, L., Pasquale, L.R., Bressler, N.M., Webster, D.R.,Abramo ﬀ , M., Wong, T.Y., 2019. Deep learning in ophthalmology: Thetechnical and clinical considerations. Progress in Retinal and Eye Re-search 72, 100759. URL: , doi: https://doi.org/10.1016/j.preteyeres.2019.04.003 .Ting, D.S.W., Cheung, C.Y.L., Lim, G., Tan, G.S.W., Quang, N.D., Gan,A., Hamzah, H., Garcia-Franco, R., San Yeo, I.Y., Lee, S.Y., Wong,E.Y.M., Sabanayagam, C., Baskaran, M., Ibrahim, F., Tan, N.C., Finkel-stein, E.A., Lamoureux, E.L., Wong, I.Y., Bressler, N.M., Sivaprasad, S.,Varma, R., Jonas, J.B., He, M.G., Cheng, C.Y., Cheung, G.C.M., Aung,T., Hsu, W., Lee, M.L., Wong, T.Y., 2017. Development and Validationof a Deep Learning System for Diabetic Retinopathy and Related EyeDiseases Using Retinal Images From Multiethnic Populations With Dia-betes. JAMA 318, 2211–2223. URL: https://doi.org/10.1001/jama.2017.18152 , doi: .Ting, D.S.W., Pasquale, L.R., Peng, L., Campbell, J.P., Lee, A.Y., Raman,R., Tan, G.S.W., Schmetterer, L., Keane, P.A., Wong, T.Y., 2018b. Arti-ﬁcial intelligence and deep learning in ophthalmology. British Journal ofOphthalmology .Tu, Z., Gao, S., Zhou, K., Chen, X., Fu, H., Gu, Z., Cheng, J.,Yu, Z., Liu, J., 2020. Sunet: A lesion regularized model for si-multaneous diabetic retinopathy and diabetic macular edema grading,in: 17th IEEE International Symposium on Biomedical Imaging, ISBI2020, Iowa City, IA, USA, April 3-7, 2020, IEEE. pp. 1378–1382.URL: https://doi.org/10.1109/ISBI45749.2020.9098673 , doi: .V, S.A., Sivaswamy, J., 2019. Matching the characteristics of fundus andsmartphone camera images, in: 16th IEEE International Symposium onBiomedical Imaging, ISBI 2019, Venice, Italy, April 8-11, 2019, IEEE. pp.569–572. URL: https://doi.org/10.1109/ISBI.2019.8759381 ,doi: .Varadarajan, A.V., Poplin, R., Blumer, K., Angerm¨uller, C., Ledsam, J.,Chopra, R., Keane, P.A., Corrado, G., Peng, L., Webster, D.R., 2017. Deeplearning for predicting refractive error from retinal fundus images. CoRRabs / http://arxiv.org/abs/1712.07798 .Vos, T., Allen, C., Arora, M., Barber, R.M., Bhutta, Z.A., Brown, A., Carter,A., Casey, D.C., et al., F.J.C., 2016. Global, regional, and national in-cidence, prevalence, and years lived with disability for 310 diseases andinjuries, 1990–2015: a systematic analysis for the global burden of diseasestudy 2015. The Lancet 388, 1545 – 1602. URL: , doi: https://doi.org/10.1016/S0140-6736(16)31678-6 .Wang, B., Qiu, S., He, H., 2019a. Dual encoding u-net for retinal vessel seg-mentation, in: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou,S., Yap, P., Khan, A. (Eds.), Medical Image Computing and Computer As-sisted Intervention - MICCAI 2019 - 22nd International Conference, Shen-zhen, China, October 13-17, 2019, Proceedings, Part I, Springer. pp. 84–92. URL: https://doi.org/10.1007/978-3-030-32239-7_10 ,doi: .Wang, K., Zhang, X., Huang, S., Wang, Q., Chen, F., 2020. Ctf-net:Retinal vessel segmentation via deep coarse-to-ﬁne supervision network,in: 17th IEEE International Symposium on Biomedical Imaging, ISBI2020, Iowa City, IA, USA, April 3-7, 2020, IEEE. pp. 1237–1241.URL: https://doi.org/10.1109/ISBI45749.2020.9098742 , doi: .Wang, M., Deng, W., 2018. Deep visual domain adaptation: A survey. Neu-rocomputing 312, 135–153. URL: https://doi.org/10.1016/j.ne ucom.2018.05.083 , doi: .Wang, S., Yu, L., Li, K., Yang, X., Fu, C., Heng, P., 2019b. Boundaryand entropy-driven adversarial learning for fundus image segmentation,in: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S.,Yap, P., Khan, A. (Eds.), Medical Image Computing and Computer As-sisted Intervention - MICCAI 2019 - 22nd International Conference, Shen-zhen, China, October 13-17, 2019, Proceedings, Part I, Springer. pp. 102–110. URL: https://doi.org/10.1007/978-3-030-32239-7_12 ,doi: .Wang, S., Yu, L., Yang, X., Fu, C., Heng, P., 2019c. Patch-based output spaceadversarial learning for joint optic disc and cup segmentation. IEEE Trans.Med. Imaging 38, 2485–2495. URL: https://doi.org/10.1109/TMI.2019.2899910 , doi: .Wang, X., Ju, L., Zhao, X., Ge, Z., 2019d. Retinal abnormalities recogni-tion using regional multitask learning, in: Shen, D., Liu, T., Peters, T.M.,Staib, L.H., Essert, C., Zhou, S., Yap, P., Khan, A. (Eds.), Medical ImageComputing and Computer Assisted Intervention - MICCAI 2019 - 22nd In-ternational Conference, Shenzhen, China, October 13-17, 2019, Proceed-ings, Part I, Springer. pp. 30–38. URL: https://doi.org/10.1007/978-3-030-32239-7_4 , doi: .Wang, X., Xu, M., Li, L., Wang, Z., Guan, Z., 2019e. Pathology-awaredeep network visualization and its application in glaucoma image syn-thesis, in: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou,S., Yap, P., Khan, A. (Eds.), Medical Image Computing and Computer As-sisted Intervention - MICCAI 2019 - 22nd International Conference, Shen-zhen, China, October 13-17, 2019, Proceedings, Part I, Springer. pp. 423–431. URL: https://doi.org/10.1007/978-3-030-32239-7_47 ,doi: .Wang, Z., Dong, N., Rosario, S.D., Xu, M., Xie, P., Xing, E.P., 2019f.Ellipse detection of optic disc-and-cup boundary in fundus images, in:16th IEEE International Symposium on Biomedical Imaging, ISBI 2019,Venice, Italy, April 8-11, 2019, IEEE. pp. 601–604. URL: https://doi.org/10.1109/ISBI.2019.8759173 , doi: .Wang, Z., Yin, Y., Shi, J., Fang, W., Li, H., Wang, X., 2017. Zoom-in-net:Deep mining lesions for diabetic retinopathy detection, in: Descoteaux,M., Maier-Hein, L., Franz, A.M., Jannin, P., Collins, D.L., Duchesne,S. (Eds.), Medical Image Computing and Computer Assisted Interven-tion - MICCAI 2017 - 20th International Conference, Quebec City, QC,Canada, September 11-13, 2017, Proceedings, Part III, Springer. pp. 267–275. URL: https://doi.org/10.1007/978-3-319-66179-7_31 ,doi: .Wilkinson, C.P., Iii, F.L.F., Klein, R.E., Lee, P.P., Agardh, C.D., Davis,M., Dills, D., Kampik, A., Pararajasegaram, R., Verdaguer, J.T., 2003.Proposed international clinical diabetic retinopathy and diabetic macularedema disease severity scales. Ophthalmology 110, 0–1682.Wu, Y., Xia, Y., Song, Y., Zhang, D., Liu, D., Zhang, C., Cai, W., 2019.Vessel-net: Retinal vessel segmentation under multi-path supervision, in:Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.,Khan, A. (Eds.), Medical Image Computing and Computer Assisted In-tervention - MICCAI 2019 - 22nd International Conference, Shenzhen,China, October 13-17, 2019, Proceedings, Part I, Springer. pp. 264–272. URL: https://doi.org/10.1007/978-3-030-32239-7_30 ,doi: .Wu, Y., Xia, Y., Song, Y., Zhang, Y., Cai, W., 2018. Multiscale net-work followed network model for retinal vessel segmentation, in: Frangi,A.F., Schnabel, J.A., Davatzikos, C., Alberola-L´opez, C., Fichtinger,G. (Eds.), Medical Image Computing and Computer Assisted Interven-tion - MICCAI 2018 - 21st International Conference, Granada, Spain,September 16-20, 2018, Proceedings, Part II, Springer. pp. 119–126.URL: https://doi.org/10.1007/978-3-030-00934-2_14 , doi: .Wu, Y., Xia, Y., Song, Y., Zhang, Y., Cai, W., 2020. Nfn + : A novel net-work followed network for retinal vessel segmentation. Neural Networks126, 153–162. URL: https://doi.org/10.1016/j.neunet.2020.02.018 , doi: .Xie, S., Tu, Z., 2015. Holistically-nested edge detection, in: 2015 IEEEInternational Conference on Computer Vision (ICCV), pp. 1395–1403.Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A.C., Salakhutdinov, R., Zemel,R.S., Bengio, Y., 2015. Show, attend and tell: Neural image caption gen-eration with visual attention, in: Bach, F.R., Blei, D.M. (Eds.), Proceed-ings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, JMLR.org. pp. 2048–2057. URL: http://proceedings.mlr.press/v37/xuc15.html .Xu, X., Zhang, L., Li, J., Guan, Y., Zhang, L., 2020. A hybrid global-local representation CNN model for automatic cataract grading. IEEEJ. Biomed. Health Informatics 24, 556–567. URL: https://doi.org/10.1109/JBHI.2019.2914690 , doi: .Xue, J., Yan, S., Qu, J., Qi, F., Qiu, C., Zhang, H., Chen, M.,Liu, T., Li, D., Liu, X., 2019. Deep membrane systems formultitask segmentation in diabetic retinopathy. Knowl. Based Syst.183. URL: https://doi.org/10.1016/j.knosys.2019.104887 ,doi: .Yan, F., Cui, J., Wang, Y., Liu, H., Liu, H., Wei, B., Yin, Y., Zheng, Y.,2018a. Deep random walk for drusen segmentation from fundus im-ages, in: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-L´opez, C.,Fichtinger, G. (Eds.), Medical Image Computing and Computer AssistedIntervention - MICCAI 2018 - 21st International Conference, Granada,Spain, September 16-20, 2018, Proceedings, Part II, Springer. pp. 48–55. URL: https://doi.org/10.1007/978-3-030-00934-2_6 , doi: .Yan, Z., Han, X., Wang, C., Qiu, Y., Xiong, Z., Cui, S., 2019a. Learning mu-tually local-global u-nets for high-resolution retinal lesion segmentationin fundus images, in: 16th IEEE International Symposium on Biomedi-cal Imaging, ISBI 2019, Venice, Italy, April 8-11, 2019, IEEE. pp. 597–600. URL: https://doi.org/10.1109/ISBI.2019.8759579 , doi: .Yan, Z., Yang, X., Cheng, K., 2018b. Joint segment-level and pixel-wiselosses for deep learning based retinal vessel segmentation. IEEE Trans.Biomed. Engineering 65, 1912–1923. URL: https://doi.org/10.1109/TBME.2018.2828137 , doi: .Yan, Z., Yang, X., Cheng, K., 2019b. A three-stage deep learning model foraccurate retinal vessel segmentation. IEEE J. Biomed. Health Informat-ics 23, 1427–1436. URL: https://doi.org/10.1109/JBHI.2018.2872813 , doi: .Yang, Y., Li, T., Li, W., Wu, H., Fan, W., Zhang, W., 2017. Le-sion detection and grading of diabetic retinopathy via two-stages deepconvolutional neural networks, in: Descoteaux, M., Maier-Hein, L.,Franz, A.M., Jannin, P., Collins, D.L., Duchesne, S. (Eds.), Med-ical Image Computing and Computer Assisted Intervention - MIC-CAI 2017 - 20th International Conference, Quebec City, QC, Canada,September 11-13, 2017, Proceedings, Part III, Springer. pp. 533–540.URL: https://doi.org/10.1007/978-3-319-66179-7_61 , doi: .Yin, P., Wu, Q., Xu, Y., Min, H., Yang, M., Zhang, Y., Tan, M., 2019. Pm-net: Pyramid multi-label network for joint optic disc and cup segmenta-tion, in: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou,S., Yap, P., Khan, A. (Eds.), Medical Image Computing and Computer As-sisted Intervention - MICCAI 2019 - 22nd International Conference, Shen-zhen, China, October 13-17, 2019, Proceedings, Part I, Springer. pp. 129–137. URL: https://doi.org/10.1007/978-3-030-32239-7_15 ,doi: .Yu, F., Zhao, J., Gong, Y., Wang, Z., Li, Y., Yang, F., Dong, B.,Li, Q., Zhang, L., 2019. Annotation-free cardiac vessel segmenta-tion via knowledge transfer from retinal images, in: Shen, D., Liu,T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P., Khan,A. (Eds.), Medical Image Computing and Computer Assisted Inter-vention - MICCAI 2019 - 22nd International Conference, Shenzhen,China, October 13-17, 2019, Proceedings, Part II, Springer. pp. 714–722. URL: https://doi.org/10.1007/978-3-030-32245-8_79 ,doi: .Yu, L., Qin, Z., Zhuang, T., Ding, Y., Qin, Z., Choo, K.R., 2020. A frameworkfor hierarchical division of retinal vascular networks. Neurocomputing392, 221–232. URL: https://doi.org/10.1016/j.neucom.2018.11.113 , doi: .Zhang, S., Fu, H., Yan, Y., Zhang, Y., Wu, Q., Yang, M., Tan, M., Xu,Y., 2019a. Attention guided network for retinal image segmentation,in: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S.,Yap, P., Khan, A. (Eds.), Medical Image Computing and Computer As-sisted Intervention - MICCAI 2019 - 22nd International Conference, Shen-zhen, China, October 13-17, 2019, Proceedings, Part I, Springer. pp. 797–805. URL: https://doi.org/10.1007/978-3-030-32239-7_88 ,doi: .1Zhang, W., Zhong, J., Yang, S., Gao, Z., Hu, J., Chen, Y., Yi,Z., 2019b. Automated identiﬁcation and grading system of diabeticretinopathy using deep neural networks. Knowl. Based Syst. 175, 12–25. URL: https://doi.org/10.1016/j.knosys.2019.03.016 ,doi: .Zhang, Y., Chung, A.C.S., 2018. Deep supervision with addi-tional labels for retinal vessel segmentation task, in: Frangi, A.F.,Schnabel, J.A., Davatzikos, C., Alberola-L´opez, C., Fichtinger, G.(Eds.), Medical Image Computing and Computer Assisted Interven-tion - MICCAI 2018 - 21st International Conference, Granada, Spain,September 16-20, 2018, Proceedings, Part II, Springer. pp. 83–91.URL: https://doi.org/10.1007/978-3-030-00934-2_10 , doi: .Zhang, Z., Liu, J., Yin, F., Lee, B., Wong, D.W.K., Sung, K.R., 2013. Achiko-k: Database of fundus images from glaucoma patients, in: 2013 IEEE 8thConference on Industrial Electronics and Applications (ICIEA), pp. 228–231.Zhang, Z., Srivastava, R., Liu, H., Chen, X., Duan, L., Wong, D.W.K.,Kwoh, C.K., Wong, T.Y., Liu, J., 2014. A survey on computer aideddiagnosis for ocular diseases. BMC Med. Inf. & Decision Making14, 80. URL: https://doi.org/10.1186/1472-6947-14-80 , doi: .Zhang, Z., Yin, F.S., Liu, J., Wong, W.K., Tan, N.M., Lee, B.H., Cheng, J.,Wong, T.Y., 2010. Origa-light: An online retinal fundus image database forglaucoma analysis and research, in: 2010 Annual International Conferenceof the IEEE Engineering in Medicine and Biology, pp. 3065–3068.Zhao, H., Li, H., Cheng, L., 2020a. Improving retinal vessel seg-mentation with joint local loss by matting. Pattern Recognit. 98.URL: https://doi.org/10.1016/j.patcog.2019.107068 , doi: .Zhao, H., Li, H., Maurer-Stroh, S., Cheng, L., 2018. Synthesizing retinal andneuronal images with generative adversarial nets. Medical Image Anal. 49,14–26. URL: https://doi.org/10.1016/j.media.2018.07.001 ,doi: .Zhao, H., Li, H., Maurer-Stroh, S., Guo, Y., Deng, Q., Cheng, L., 2019a. Su-pervised segmentation of un-annotated retinal fundus images by synthesis.IEEE Trans. Medical Imaging 38, 46–56. URL: https://doi.org/10.1109/TMI.2018.2854886 , doi: .Zhao, H., Yang, B., Cao, L., Li, H., 2019b. Data-driven enhancementof blurry retinal images via generative adversarial networks, in: Shen,D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.,Khan, A. (Eds.), Medical Image Computing and Computer Assisted In-tervention - MICCAI 2019 - 22nd International Conference, Shenzhen,China, October 13-17, 2019, Proceedings, Part I, Springer. pp. 75–83. URL: https://doi.org/10.1007/978-3-030-32239-7_9 , doi: .Zhao, R., Chen, X., Liu, X., Chen, Z., Guo, F., Li, S., 2020b. Directcup-to-disc ratio estimation for glaucoma screening via semi-supervisedlearning. IEEE J. Biomed. Health Informatics 24, 1104–1113. URL: https://doi.org/10.1109/JBHI.2019.2934477 , doi: .Zhao, R., Chen, Z., Liu, X., Zou, B., Li, S., 2019c. Multi-index opticdisc quantiﬁcation via multitask ensemble learning, in: Shen, D., Liu,T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P., Khan, A.(Eds.), Medical Image Computing and Computer Assisted Intervention- MICCAI 2019 - 22nd International Conference, Shenzhen, China, Oc-tober 13-17, 2019, Proceedings, Part I, Springer. pp. 21–29. URL: https://doi.org/10.1007/978-3-030-32239-7_3 , doi: .Zhao, R., Li, S., 2020. Multi-indices quantiﬁcation of optic nerve head infundus image via multitask collaborative learning. Medical Image Anal.60. URL: https://doi.org/10.1016/j.media.2019.101593 , doi: .Zhao, R., Liao, W., Zou, B., Chen, Z., Li, S., 2019d. Weakly-supervised si-multaneous evidence identiﬁcation and segmentation for automated glau-coma diagnosis, in: The Thirty-Third AAAI Conference on Artiﬁcial In-telligence, AAAI 2019, The Thirty-First Innovative Applications of Arti-ﬁcial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposiumon Educational Advances in Artiﬁcial Intelligence, EAAI 2019, Honolulu,Hawaii, USA, January 27 - February 1, 2019, AAAI Press. pp. 809–816. URL: https://doi.org/10.1609/aaai.v33i01.3301809 ,doi: . Zhao, Z., Zhang, K., Hao, X., Tian, J., Chua, M.C.H., Chen, L., Xu, X.,2019e. Bira-net: Bilinear attention net for diabetic retinopathy grad-ing, in: 2019 IEEE International Conference on Image Processing, ICIP2019, Taipei, Taiwan, September 22-25, 2019, IEEE. pp. 1385–1389.URL: https://doi.org/10.1109/ICIP.2019.8803074 , doi: .Zheng, Y., Cheng, C., Lamoureux, E.L., Chiang, P.P., Anuar, A.R., Wang,J.J., Mitchell, P., Saw, S., Wong, T.Y., 2013. How much eye care servicesdo asian populations need? projection from the singapore epidemiology ofeye disease (seed) study. Investigative Ophthalmology & Visual Science54, 2171–2177.Zhou, B., Khosla, A., Lapedriza, `A., Oliva, A., Torralba, A., 2016. Learningdeep features for discriminative localization, in: 2016 IEEE Conference onComputer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV,USA, June 27-30, 2016, IEEE Computer Society. pp. 2921–2929. URL: https://doi.org/10.1109/CVPR.2016.319 , doi: .Zhou, Y., He, X., Cui, S., Zhu, F., Liu, L., Shao, L., 2019. High-resolution di-abetic retinopathy image synthesis manipulated by grading and lesions,in: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S.,Yap, P., Khan, A. (Eds.), Medical Image Computing and Computer As-sisted Intervention - MICCAI 2019 - 22nd International Conference, Shen-zhen, China, October 13-17, 2019, Proceedings, Part I, Springer. pp. 505–513. URL: https://doi.org/10.1007/978-3-030-32239-7_56 ,doi: .Zhou, Y., Li, G., Li, H., 2020. Automatic cataract classiﬁcation usingdeep neural network with discrete state transition. IEEE Trans. Med.Imaging 39, 436–446. URL: https://doi.org/10.1109/TMI.2019.2928229 , doi: .Zhou, Z., 2018. A brief introduction to weakly supervised learning. NationalScience Review 5, 44–53.Zhu, J., Park, T., Isola, P., Efros, A.A., 2017. Unpaired image-to-image trans-lation using cycle-consistent adversarial networks, in: 2017 IEEE Interna-tional Conference on Computer Vision (ICCV), pp. 2242–2251.Zou, B., He, Z., Zhao, R., Zhu, C., Liao, W., Li, S., 2020. Non-rigid retinal image registration using an unsupervised structure-driven regression network. Neurocomputing 404, 14–25. URL: https://doi.org/10.1016/j.neucom.2020.04.122 , doi:10.1016/j.neucom.2020.04.122