[PDF] A Research Agenda on Pediatric Chest X-Ray: Is Deep Learning Still in Childhood?

Abstract

Several reasons explain the significant role that chest X-rays play on supporting clinical analysis and early disease detection in pediatric patients, such as low cost, high resolution, low radiation levels, and high availability. In the last decade, Deep Learning (DL) has been given special attention from the computer-aided diagnosis research community, outperforming the state of the art of many techniques, including those applied to pediatric chest X-rays (PCXR). Due to this increasing interest, much high-quality secondary research has also arisen, overviewing machine learning and DL algorithms on medical imaging and PCXR, in particular. However, these secondary studies follow different guidelines, hampering their reproduction or improvement by third-parties regarding the identified trends and gaps. This paper proposes a "deep radiography" of primary research on DL techniques applied in PCXR images. We elaborated on a Systematic Literature Mapping (SLM) protocol, including automatic search on six sources for studies published from January 1, 2010, to May 20, 2020, and selection criteria utilized on a hundred research papers. As a result, this paper categorizes twenty-six relevant studies and provides a research agenda highlighting limitations, gaps, and trends for further investigations on DL usage in PCXR images. Besides the fact that there is no systematic mapping study on this research topic, to the best of authors' knowledge, this work organizes the process of finding and selecting relevant studies and data gathering and synthesis in a reproducible way.

Full PDF

PPediatric Chest Radiography Research Agenda:Is Deep Learning Still in Childhood?

Afonso U. Fonseca ∗ , Gabriel S. Vieira † , Fabrízzio A. A. M. N. Soares ‡∗ , Renato F. Bulcão-Neto ∗ ∗ Federal University of Goiás,

Pixelab Laboratory . Goiânia - GO, Brazil † Instituto Federal Goiano,

Computer Vision Lab , Urutaí/GO, Brazil ‡ Department of Computer Science, Southern Oregon University, Ashland/OR, USAEmail: { afonsoueslei, rbulcao}@ufg.br, [email protected], [email protected] Abstract —Despite advances in the acquisition of medical imag-ing and computer-aided support techniques, x-rays due to theirlow cost, high availability and low radiation levels are still animportant diagnostic procedure, constituting the most frequentlyperformed radiographic examination in pediatric patients fordisease investigation while researchers are looking for increas-ingly efﬁcient techniques to support decision-making. Emergingin the last decade as a viable alternative, deep learning (DL),a technique inspired by neuroscientiﬁc and neural connections,has gained much attention from researchers and made signiﬁcantadvances in the ﬁeld of medical imaging, outperformed the state-of-art of many techniques, including those applied to pediatricchest radiography (PCXR). Given the scenario and consideringthe fact that, as far as we know, there is still no mapping studyon the application of deep learning techniques in PCXR images,we propose in this article a "deep radiography" of the lastdecade in this research topic and a preliminary research agendathat deals with the state of the art of applying DL on PCXRthat constitute a collaborative tool for future researchers. Ourgoal is to identify primary studies and support the process ofchoosing and developing DL techniques applied to PCXR images,in addition to pointing out gaps and trends by drawing up apreliminary research agenda. A protocol is described in eachphase detailing criteria used from selection to extraction andour set of selected studies is subjected to careful analysis torespond to the research form. Six basic sources were used andthe synthesis, results, limitations, and conclusions are exposed.

Index Terms —Systematic mapping, Deep learning, Neuralnetwork, CNN, pediatric, X-ray, CXR, Chest, Lung, Thorax

I. I

NTRODUCTION

Children between 0 and 14 years old account for morethan 25% of the world population [1], asthma affects 14%of children and has been increasing [2], of all deaths amongchildren under 5, 18% about 1.4 million a year, are causedby pneumonia and respiratory diseases are among the leadingcauses of child death in the world, affecting mainly residentsin underdeveloped countries and with few resources [3].Nowadays, it is impossible to address any pediatric pathol-ogy without the support and full analysis of a pediatric radi-ological study; however, many countries do not offer trainingdedicated to pediatric radiology and there is a global shortageof pediatric professionals, the causes range from low pay,the need to be always available and the high specializationrequired which does not attract new residents to include thissub-specialty in your main options [4]. Chest radiography (CXR) is not the most modern or ac-curate image diagnosis, and its use has several limitations,mainly related to its two-dimensional nature, which can leadto consolidation, adenopathy or complications masked byother anatomical structures, such as the heart, mediastinum,and diaphragm, can also lead to the problem of the sumshadows [5]. Nevertheless, in many cases, CXR is preferredover other more modern and accurate imaging diagnoses, suchas magnetic resonance (MRI), computed tomography (CT),positron emission tomography (PET) and ultrasound (USG),as it provides high resolution, very small dose of ionizingradiation and is a low-cost test, with high availability and easyacquisition, even in peripheral regions, has been the initial testfor the investigation or disposal of many diseases, even thosethat require other types of imaging or exams, about 350 millionradiographs are performed on children alone, while 40% of allpediatric images consist of CXR [6], [7], [8].Children are not to be considered as “little adults”. Thus,medical examinations in children will have to be differentfrom those in adults. This is particularly true for paediatricX-ray examinations. Children differ from adults regarding:anthropometry, anatomy and physiology, psychology, radiationbiology and radiation risk [9]. However, advances in pediatricradiology are always based on adult radiology and the proto-cols designed for them pose technical challenges when appliedto children. In PCXR, for example, they are related to thesmaller size of the examined area and to differences in certainfunctions, such as increased heart rate in neonates [8].In addition, pediatric images, unlike images of adult pa-tients, present other challenges, both for human and compu-tational interpretation, because they are affected, for example,by the environment and equipment of acquisition not suitablefor this audience, the cooperation of patients for positioningand maneuvers, insufﬁcient inspiration, the greatest variationin anatomical structures and disease patterns, strict adherenceto the ALARA principle (As Low As Reasonably Achievable)and the frequent presence of artifacts [10], [11], [12].Another problem is the more limited number of studies anddatabases speciﬁcally related to pediatric chest images, suchas cancer studies that, even with a growing body of literatureon the subject, generally involve adult patients with speciﬁcknowledge in limited pediatrics [13], just to cite an example.Accurate diagnosis and attribution of the causes of a disease a r X i v : . [ c s . OH ] J u l re important to mediate its burden, implement appropriateprevention or treatment strategies and develop more effectiveinterventions, which directly affect the efﬁciency and cost oftreatment [5]. Therefore, there is an urgent need to developagile and reliable software and hardware solutions to assistin this long, difﬁcult and expensive task of diagnosing an in-creasing number of images, especially considering the limitednumber of experienced radiologists [14].Over the past few decades, medical imaging techniques,such as CT, MRI, PET, USG and CXR, have been used for theearly detection, diagnosis, and treatment of various diseases[15]. On the other hand, computational medical image analysishas become a prominent ﬁeld of research at the intersection ofInformatics, Computational Sciences, and Medicine, supportedby a vibrant community of researchers working in academics,industry, and centers [16].Machine learning methods have brought us a revolutionin the ﬁeld of computer vision, effectively solving manyproblems that have remained unresolved for a long time,and DL is now becoming the dominant approach, with verypromising results in many areas extend to medical images.Deep Learning (DL) is a sub-area of machine learning basedon a model (neural net) that mimics the workings of thehuman brain in processing data and creating patterns for usein decision making [17]. This process where the computer actsas human experts in deﬁning the feature sets to be extractedfrom the images is a complete paradigm shift that has beencalled by some at the end of the code [18].Although it appeared in the 1980s [19], only recently hasDL emerged as a promising computational technique for awide range of research areas, including the medical ﬁeldthat has extensively used DL frameworks to detect multipleorgans [20], [21], classiﬁcation [22], and segmentation tasks[23], [24]. The most important reasons for this are advancesin hardware development now available, especially in paral-lel processing of computers with Graphics Processing Units(GPUs), the development of new techniques designed for moreefﬁcient deep network training, and the availability of muchmore training data, allowing thus the use of its full potential[25], [18].In [26] an interesting discussion is presented on the use ofmachine learning and artiﬁcial intelligence and its implicationsin radiology, ranging from the description of the types oflearning to pointing out the challenges of its implementationin children’s images, which includes obstacles technical andregulatory aspects, as well as the opaque character of convo-lution neural networks (CNNs).Litjens et al [27] presented a broad review of the mainconcepts of DL pertinent to the analysis of medical imagesincluding CXR while Ginneken [18] in one other review studyover the past ﬁfty years of techniques applied to chest imagingshowed that machine learning has made it the dominanttechnology for tackling CAD in the lungs, further indicatingthat DL even better results can be achieved.Although several primary studies and few secondary studieshave addressed PCXR in DL, as far as I know, none of them is a systematic mapping (SM). SM or scoping studies are usedby many researchers on a number of areas using differentguidelines or methods. These studies are designed to give anoverview of a research area through classiﬁcation and countingcontributions in relation to the categories of that classiﬁcation[28]. In addition, a well-documented SM study allows itsreproduction by other researchers and further discussion ofthe topic under analysis.Petersen et al. [29] proposed that a mapping study precedinga systematic review provides a valuable baseline. Kitchenhamet al. [30], [31] observed multiple beneﬁts in to do systematicmaps such us time-savings for follow-up studies (e.g. dueto reuse of study protocols); good overview of an area andthe ability to identify research gaps; visualization of researchtrends; related work identiﬁcation, etc. Kitchenham et al. [30],[31] also pointed out that it is important to have a well deﬁnedand reliable classiﬁcation scheme.This article provides a research agenda on a hot topicof great attention and interest (see Figure 1) and is basedon a SM. A broad understanding of the application of DLtechniques in pediatric chest X-ray images is presented high-lighting its limitations, gaps, and future trends, it is supportedby the selection and synthesis of closed primary studies onthis subject.Fig. 1: Interest over the last decade on "deep learning" and"chest x-ray". Numbers represent search interest relative tothe highest point on the chart for the given region and time.A value of 100 is the peak popularity for the term. A value of50 means that the term is half as popular. A score of 0 meansthere was not enough data for this term.The objective of this research agenda is therefore to con-tribute to the following items: • point directly to the same conclusions as those veriﬁedin the reviews, with a summary based on the extractionof data from a comprehensive research form; • support, through statistical data, the process of choosingand developing techniques DL applied to PCXR; • point out the maturity level of techniques DL in each ofthe tasks applied to PCXR; • indicate new bottlenecks and trends not yet pointed outby the reviews; • provide a detailed SM process for allows possible repro-duce, updates, and developments.ur protocol based on described by Felizardo et al. [32] isdescribed in each phase detailing criteria used from selectionto extraction. This work includes studies published between2010 and 2020 that were selected and subjected to carefulanalysis to respond to the research form. Six base sources wereused and the synthesis, results, limitations, and conclusions areexposed.The remainder of this paper is organized as follows. SectionII describes related work to our proposal. Section III detailsthe study protocol applied. Section IV presents the mappingstudy results from the extracted data and discusses the mainobservations found. Section V introduces a research agendato gear future works on requirement patterns, and Section VIdiscusses the validity threats of this work and elaborates ontheir mitigation. Finally, Section VII suggests directions forfuture work. II. R ELATED WORK

Computational methods applied to CXR, especially in theﬁeld of computer vision, have been of great interest to thescientiﬁc community for a long time and more recently haveimproved their results through the use of DL techniques.In this section, we present related work to the researchproposed in this article.Litjens et al [27] analyzed the main DL concepts applied tothe analysis of medical images and summarized more than 300contributions, grouping them into image classiﬁcation, objectdetection, segmentation, registration, and other tasks. Theauthors concisely demonstrate studies by area of applicationand the growing interest in the application of DL, such asthe challenge of lung cancer detection on CXR of the KaggleData Science Bowl 2017, with US$1 million in prizes andmore than one thousand participating teams. In the end, theauthors noted DL will thus not only have a great impact inmedical image analysis but in medical imaging as a whole.Ginneken [18], in a rich article, reviews the literature aboutarticles written last of 50 years and shows the evolution ofvarious computational analysis techniques in chest imaging,from the rule-based to the DL and the point where the latterbecomes the primary choice for image analysis. While observ-ing various DL models, Ginneken discusses only the convnets.He explains why convolutional networks (convnets), while notas recent in image analysis techniques, only gained momentumfrom 2012 by pointing to the following reasons: (1) newtechniques designed for more efﬁcient deep network training;(2) availability of much more training data; (3) advances inparallel processing of computers with GPUs. Ginneken’s studyadequately addresses DL and CXR among other subjects, butmost of it is about CT images.Likewise Koichiro Yasaka & Osamu Abe [33] presents ainteresting review of DL and artiﬁcial intelligence in radi-ology with important highlights of various applications ofDL that can aid with detection, diagnosis, staging, and sub-classiﬁcation of conditions in radiological images. They alsopoint to the limitations of DL, such as the poor readabilityand interpretation of the characteristics and calculations that models use to make a classiﬁcation, which makes it verydifﬁcult to resolve conﬂicts when the judgment of physiciansor radiologists differs from models trained.Lee et al. [34] in their review investigated the applicationof DL in CXR and CT images, highlighting their ability todeal with new information, an essential limitation in computer-aided detection. They also point out that while DL has shownimpressive advances in many ﬁelds in the speciﬁc medicalﬁeld, this technique is still in its infancy. According to theauthors, several studies show that DL approaches have highpotential to overcome the limitations of existing CAD systems,but there is still concern about this technology in terms ofclinical application.In a more recent study Tajbakhsh et. al [35] reviewedDL techniques applied speciﬁcally to the segmentation ofmedical images, raising questions mainly related to scarcityand quality of data set. They compare current methodologies,their beneﬁts, and requirements, and ultimately recommendsolutions to address each of the limitations raised.Revisions of these authors, even if not systematic, are veryimportant in helping other researchers understand issues suchas the current state of the art in the area, its limitations, its po-tentials and future directions. Despite the undeniable value ofthis, in a systematic review there is more because, in additionto providing strong evidence on a speciﬁc topic, identifying,analyzing, interpreting and summarizing its evidence clearlyand objectively, it also allows other researchers to reproduce,what is very much important [32].Pande et al in [36] provide a systematic review of computer-assisted detection of pulmonary tuberculosis on CXR digital.Its systematic review is one of the few available on this topic(if not the only one) which shows that efforts in this regardcan make a valuable contribution.The work by Pande et al.[36] covered about papers pub-lished between 1 January 2010 and 31 December 2015, usedfour sources PubMed, EMBASE, SCOPUS, and EngineeringVillage with a sensitive search strategy formulated in con-sultation with a medical librarian. In 455 articles returnedfrom their four research sources, after applying inclusion andexclusion criteria, only 5 remained, which were used forextraction and synthesis. The extraction process was conductedby two independent reviewers using a standardized form. Inthe end, the authors point out that while limited by the smallnumber of studies the evidence was that most had methodolog-ical limitations, the availability, and evaluation of only onesoftware program, and generalization only for environmentswhere PTB and HIV are less prevalent and therefore furtherresearch was needed.While systematic reviews aim at synthesizing evidence,also considering the strength of evidence, SM are primarilyconcerned with structuring a research area [28].Therefore, our review work differs from related reviews intwo ways. First, it deals speciﬁcally with the application ofDL on pediatric chest radiographs and secondly because it isthe ﬁrst research agenda supported by a study of SM appliedto this topic, to the best of our knowledge.II. S

TUDY P ROTOCOL

Although SMs appear to be systematic reviews at manypoints, they are not the same, while the latter aim to synthesizeevidence, also considering the strength of evidence, the SMsor scoping studies primarily concerned with structuring aresearch area and have as goals to give an overview of aresearch are through classiﬁcation and counting contributionsin relation to the categories of that classiﬁcation [28], [37].For conducted this SM we used the phases described byFelizardo et al. [32], it correspond: planning, conducting andpublishing the results (see Figure 2). The StArt tool [38] wasused as support the management of this systematic study also.Fig. 2: Phases and activities this study, adapted from [32]

A. Planning

Before we begin planning phase we search for surveys andsecondary studies related to our proposal and with objectivessimilar to those we had in mind. This step was performedfor two purposes: ﬁrst to check if there were any systematicreviews for the same research topic; second to evaluate theperformance of the search string. Later, the results of this stepwere also used to guide a reverse snowballing technique whichconsists of evaluating the reference list of a relevant primarystudy, looking for other relevant primary studies [32].The planning phase is an iterative process that goes from theobjective statement to the evaluation, which is used to makepossible adjustments if necessary.

1) Formulating the research questions:

The deﬁnition ofthe purpose of our study is adapted from the PICO [39] criteriaderived from medicine. The structure is described in Table I.A Research Question (RQ) is the fundamental core of aresearch project, study, or literature review because this helpsus focusing on what matters for the study in hand, guiding alsothe extraction phase of the process [40]. So we have deﬁnedour main research question as:

RQ1:

What is the state of the art of DL on PCXR imagestasks?

TABLE I: PICO Analysis

Population Papers publication about DL applications on dig-ital and conventional pediatric chest radiographs,considering all types of industries, systems andapplication domains.Intervention Tasks on chest radiographs that use any DL solu-tions.Comparison Not Applicable: Our intention is to classify thetasks performed on pediatric chest radiographs andthe methods of DL used on them, no comparemethods with other methods or processes.Outcome Overview of the context of DL solutions on tasksof pediatric chest radiographs image processing,such as diagnosis, segmentation, enhancement, re-moval of artifacts, suppression of bone structures,reconstruction, recording, etc.

The objective of this question is to identify the level ofmaturity of DL solutions applied to those canonical tasksin PCXR images (classiﬁcation, detection, segmentation,registration, retrieval, image generation, enhancement), andto investigate possible tasks not achieved by DL solutions.In addition, secondary research questions were used to betterguide the other stages of this research. In this sense, the RQsthat have been proposed for this SM is as follows:

RQ2:

Which tasks applied to pediatric chest radiographsimaging are most addressed by deep learning techniques?

Its purpose is to explain which tasks applied about CXRare more covered with DL and which are uncovered. Tasks asclassiﬁcation, diagnostic, enhancement, segmentation, objectrecognition, localization, detection, prediction/prognostic toname a few.

RQ3:

What are the metrics used for assessment?

Its purpose is to answer if exists metrics to assessmentthat are adopted how standard in each task.

RQ4:

What are the main datasets used in this research ﬁeldand how are it organized?

Its purpose to answer which datasets are available inthis search ﬁeld, whether they are public or private, whattheir sizes, CXR types, and whether they contain additionalinformation like reports, other types of images, etc.

RQ5:

Did the work have ethics committee authorization?

Its purpose is to know if the authorization of ethicscommittees, is a practice of this research ﬁeld and reasonsto be or not.

RQ6:

What are the neural network architectures used in theworks?

Its purpose is to answer if there is a dominant DLarchitecture about PCXR.

RQ7:

When and in which vehicle type was the articlespublished?

Its purpose is to understand in which vehicles and whattimeline the studies are published in the search ﬁeld.

Q8:

What the details of types of data and process appliedon DL technique?

Its purpose is to answer which training techniques,learning and processing approaches are used and whether theuse of preprocessing steps is common.

RQ9:

Which type of contribution results?

Its purpose is to identify how the contribution broughtby the study is classiﬁed, algorithm, application, framework,product.

RQ10:

Is there any international standard and is it applied?

Its purpose is to know if the studies in this ﬁeld ofresearch adopt any standard internationally and what is thisstandard.

RQ11:

How is the study classiﬁed?

Its purpose is to identify how the studies are classiﬁedbased on the classiﬁcation proposed by Petersen et al [28].

RQ12:

Which the research method adopted?

Its purpose is to identify which the research method areclassiﬁed based on the classiﬁcation proposed by Petersen etal [28].

2) Pilot Search and Search String:

Responding to theseRQs requires an appropriate research strategy based on themost relevant primary studies. To achieve this goal, the ﬁrststep is to conduct a pilot search that ﬁnds a search sequencethat balances the breadth and accuracy of the search with therelevance of the retrieved studies [32]. In this pilot research,the objectives are to deﬁne a search sequence that ﬁnds thegold-standard set of papers and also helps in deﬁning a moreconsistent protocol.With this balance in mind, we conducted our pilot searchwith a set of keywords, their synonyms and some acronymsrelated to the central research theme, an example this setwas: deep learning, deep machine learning, deep inspection,artiﬁcial intelligence, artiﬁcial neural network, neural net-work, convolution network, convolution neural network, CNN,Recurrent neural network, RNN, deep belief network, DBN,autoencoder, chest, lung, breastplate, pulmonary, thoracic,x-ray, radiograph, radiogram, CXR, child, pediatric, infant,baby, toddler, newborn and neonate.

As suggested by [32] we reexamine our set of keywordsfrom the pilot search results. This revaluation process was re-peated several times and resulted in the following observations: • Regarding the synonyms for deep learning , only neuralnetwork, CNN, and convolutional network representedsome signiﬁcance and gain to the number of articlesreturned, while the other terms did not represent anychange. • About chest only breastplate did not add results whilethe others represented return of more articles. • About the terms child, infant, baby, toddler, newborn and neonate were thought to reduce the scope of researchto pediatric radiographs, which was discarded due tothe reduced number of articles returned in digital libraryScopus for example, it was just one returned article. At the end of the pilot research execution cycle and resultsevaluations, we come to the following set of keywords thatwere used in our search string, organized as follows: • (deep learning OR neural network OR CNN OR convo-lution net AND (((chest OR lung OR thora) AND (x-rayOR radiogra)) OR CXR) AND (pediatr OR paediatr ORinfant OR baby OR newborn OR child)) It is noteworthy that "neural network", "thora", "radiogra"and "child" are sub-string that closes for example respectivelywith recurrent neural network, convolution neural, artiﬁcialneural network; thoracic and thorax; radiogram and radiogra-phy; children and childhood, etc.Other information important is that we had the help of a DLexpert to deﬁne the synonyms for related terms and beyondthis, we builded word clouds with the keywords and titles. Thisfeature is very interesting because it makes it easier to ﬁndword frequencies, the more often they are used, the higher andbolder they are. This allows you to check for word adherencein the search string and make possible adjustments. Our ﬁnalword cloud can be seen in Figure 3.Once all the keywords were deﬁned and our search stringis complete then we constructed speciﬁc queries for eachdigital library. The speciﬁc queries are necessary because eachlibrary had different boundary characteristics, depending onits possibilities and limitations. For example, some of themdo not allow the use of complete search strings; in others, itis necessary to complement these strings with simple textualsearches.Fig. 3: Clouds of words, keywords, and article titles returnedfrom digital library research used in our study. ) Search Strategy:

Having deﬁned our search string, wehad to choose the ideal set of study sources applicable toour theme. This set of selection source followed a list ofprerequisites as a view to following in this protocol: • sources considered relevant for the deep learning andmedical image areas; • sources with a search mechanism available on the Web,and logical expressions support; • sources that possible the result export with the compatibleformat with Start tool [41]; • sources that allow read access to studies that return; and • sources that allow searches at least to the metadata titleand abstract.An important note is that all searches were carried out on thesame day, on May 20, 2020, using automatic web mechanismsand queries deﬁned from the search string.As a result, the sources chosen for this SM include thefollowing search engines and digital libraries: ACM DigitalLibrary (conﬁgured to Guide to Computing Literature due toindexing a broader collection of papers), IEEE Xplorer, Scopusand PubMed.In our hands the selected sources and the set of keywordsthen we made the speciﬁc search to each digital library. Thesearch was executed on the title, abstract, and keywords of thepapers, except in PubMed library that did not allow search inkeywords. The Table II shows the ﬁnal queries in each one ofthe digital library used in this SM study.

4) Selection criteria:

After ﬁnal queries establish, weredeﬁned as the inclusion and exclusion criteria to use on theselection of primary studies. The exclusion criteria EC are asfollows: ◦ EC1

Full text not accessible. ◦ EC2

It is not in the English language. ◦ EC3

It is not a scientiﬁc article published in Annals ofevents or journals. ◦ EC4

It is not about deep learning applied to PCXR. ◦ EC5

It was published before 2010. ◦ EC6

It is not a primary study. ◦ EC7

It is an old version of a study already considered.The exclusion of a study occurs when it falls into at least oneof such exclusion criteria. If not excluded, the study must meeteach of the following inclusion criteria: IC : • IC1

It is a primary study. • IC2

It is about deep learning applied to PCXR. • IC3

It was published after 2010.

B. Conducting

The conduction phase encompasses the activities of identi-ﬁcation and selection of primary studies, data extraction andsynthesis. The search strategy as part of the study protocolallows the identiﬁcation of the studies, whereas the selection ofthese relies on inclusion and exclusion criteria and assessmentquality criteria for primary studies, both previously deﬁned inthe study protocol. The data extraction activity starts as soon TABLE II: Query ﬁnal for one each of digital libraries

DIGITALLIBRAY QUERYACM DLGuide toComputingLiterature (Title:(((deep OR neural OR convolution) AND (network ORlearning)) OR CNN) AND Title:(((chest OR lung OR thora*)AND (x-ray OR radiogra*) OR cxr) AND Title:(pediatr* ORinfant OR baby OR newborn or child* or paediatr*)) OR (Ab-stract:(((deep OR neural OR convolution) AND (network ORlearning)) OR CNN) AND Abstract:(((chest OR lung OR thora*)AND (x-ray OR radiogra*)) OR cxr) AND Abstract:(pediatr*OR infant OR baby OR newborn or child* or paediatr*)) OR(Keyword:(((deep OR neural OR convolution) AND (network ORlearning)) OR CNN) AND Keyword:(((chest OR lung OR thora*)AND (x-ray OR radiogra*)) OR cxr) AND Keyword:(pediatr*OR infant OR baby OR newborn or child* or paediatr*))Engineer-ingVillage ((deep* OR neural* OR convolution*) AND (network* ORlearning*)) OR CNN* AND (((chest* OR lung* OR thora*) AND(x-ray* OR radiogra*)) OR cxr*) AND (pediatr* OR infant* ORbaby* OR newborn* OR child* OR paediatr*)Embase ((deep*:ti,ab,kw OR neural*:ti,ab,kw OR convolution*:ti,ab,kw)AND (network*:ti,ab,kw OR learning*:ti,ab,kw) ORCNN*:ti,ab,kw) AND ((chest*:ti,ab,kw OR lung*:ti,ab,kw ORthora*:ti,ab,kw) AND (’x ray*’:ti,ab,kw OR radiogra*:ti,ab,kw)OR cxr*:ti,ab,kw) AND (pediatr*:ti,ab,kw OR infant*:ti,ab,kwOR baby*:ti,ab,kw OR newborn*:ti,ab,kw OR child*:ti,ab,kwOR paediatr*:ti,ab,kw)IEEEXplorer (((("All Metadata":deep OR "All Metadata":neural OR "All Meta-data":convolution) AND ("All Metadata":network OR "All Meta-data":learning)) OR "All Metadata":CNN) AND ((("All Meta-data":chest OR "All Metadata":lung OR "All Metadata":thora*)AND ("All Metadata":x-ray OR "All Metadata":radiogra*))OR "All Metadata":CXR) AND ("All Metadata":pediatr* OR"All Metadata":infant OR "All Metadata":baby OR "All Meta-data":newborn OR "All Metadata":child* OR "All Meta-data":paediatr*))Scopus TITLE-ABS-KEY (((deep* OR neural* OR convolution*) AND(network* OR learning*)) OR CNN* AND (((chest* OR lung*OR thora*) AND (x-ray* OR radiogra*)) OR cxr*) AND (pe-diatr* OR infant* OR baby* OR newborn* OR child* ORpaediatr*))PubMed (((deep*[Title/Abstract] OR neural*[Title/Abstract] ORconvolution*[Title/Abstract]) AND (network*[Title/Abstract]OR learning*[Title/Abstract])) OR CNN*[Title/Abstract])AND(((chest*[Title/Abstract] OR lung*[Title/Abstract]OR thora*[Title/Abstract]) AND (x-ray*[Title/Abstract]OR radiogra*[Title/Abstract])) OR CXR*[Title/Abstract])AND (pediatric[Title/Abstract] OR infant[Title/Abstract]OR baby[Title/Abstract] OR newborn[Title/Abstract] ORchild[Title/Abstract] OR paediatric[Title/Abstract]) as the relevant primary studies are selected. Next, a synthesisof these studies is performed to answer the research questionsof the SM.To decide when studies should be rejected or not, we readthe title, summary, and keywords of each study, and if theywere not sufﬁcient for decision making, read the full article.After applying the inclusion and exclusion criteria of 178studies identiﬁed in the automatic search, we had 78 duplicatearticles, 74 removed by EC and 26 included. The Table IIIig. 4: A detail view of the identiﬁcation and selection processes of primary studies, adapted from [42].TABLE III: Result of inclusion and exclusion criteria applied

DIGITAL LIBRARY Identiﬁed Duplicated Removed IncludedACM DL 6 5 1 Engineering Village 36 33 3 Embase 32 17 15 IEEE Xplorer 13 7 4 PubMed 16 16 0 Scopus 75 0 51 TOTAL

178 78 74 26 shows the result of this process for each digital library.In the Table IV a breakdown of the exclusion criteria ineach base. Importantly, criterion 4 was responsible for thelargest number of excluded articles, followed by criterion5, and criteria 2 and 7 were the least responsible. This isbecause, in most systematic studies, small rates of return oftenoccur for studies relevant to the research topic studied [36],[42]. Compared to criteria 2 and 7, it is agreend that mostpublications are in the English language and unique versions.TABLE IV: Breakdown of the exclusion criteria

EXCLUSIONCRITERIA EC1 EC2 EC3 EC4 EC5 EC6 EC7 TOTALACM DL 0 0 0 1 0 0 0 Engineering Village 1 1 0 0 1 0 0 Embase 1 0 4 10 0 0 0 IEEE Xplorer 0 0 0 3 1 0 0 PubMed 0 0 0 0 0 0 0 Scopus 5 1 0 26 10 8 1 TOTAL

In addition to these steps, we also performed the snowballtechnique in three from 26 selected studies. The most citedand correlated among the selected studies (see Table V). Thistechnique gave us the advantage of the evaluation of oursearch string by comparing the returned articles with thosereferenced by these studies. A relationship with these threestudies follows. * S3 A transfer learning method with deep residual networkfor pediatric pneumonia diagnosis [43].*

S10

Classiﬁcation of images of childhood pneumonia usingconvolutional neural networks [44].*

S18

Identifying Medical Diagnoses and Treatable Diseasesby Image-Based Deep Learning [45].TABLE V: Correlation between selected studies

Study Cited by S3 [S2, S16] S4 [S16] S10 [S11, S15]

S14 [S2]

S18 [S2,S3,S5,S6,S11,S12,S13,S16,S18,S23,S24,S25]

S24 [S25]

S26 [S9]

Regarding duplicate studies, only to example the Table VIlists the select papers and its duplicates per information source.The ◦ symbol represents each study instance excluded becauseof its copies in more than one bibliographic database. The • symbol, in turn, represents the instance of a duplicate studykept for the extraction phase. Therefore, of the 26 studiesselected after exclusion criteria only 6 of them do not haveduplicates (S2, S4, S11, S16, S22, S23).The number of papers identiﬁed, duplicated, excluded andevaluated before data extraction and mapping process is foundin Figure 4 whereas the list of 26 studies selected after thisprocess, which included the technique of snowballing, can beseen in the Table VII.IV. D ATA E XTRACTION AND M APPING P ROCESS

This section describes the most important aspects and infor-mation extracted from the full-text reading of the ten primarystudies selected, which includes: • the main objective and respective RQs; • the selection methods of primary studies; and • the evidence collected from the synthesis of these studies.ABLE VI: Select papers and its duplicates per information source Study S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S20 S21 S22 S23 S24 S25 S26ACM DL ◦ ◦

Engineering Village ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

Embase ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

IEEE Xplorer • ◦ • ◦

PubMed ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

Scopus • • • • • • • • • • • • • • • • • • • • • • • •

TABLE VII: Selected primary studies

ID TITLE Reference S1 A Generic Approach to Lung Field Segmen-tation from Chest Radiographs using DeepSpace and Shape Learning [25] S2 A novel transfer learning based approach forpneumonia detection in chest X-ray images [46] S3 A transfer learning method with deep residualnetwork for pediatric pneumonia diagnosis [43] S4 An Efﬁcient Deep Learning Approach toPneumonia Classiﬁcation in Healthcare [47] S5 Automated deep learning design for medicalimage classiﬁcation by health-care profession-als with no coding experience: a feasibilitystudy [48] S6 Automated pneumonia diagnosis using a cus-tomized sequential convolutional neural net-work [49] S7 Automatic Catheter and Tube Detection in Pe-diatric X-ray Images Using a Scale-RecurrentNetwork and Synthetic Data [50] S8 Automatic tissue characterization of air trap-ping in chest radiographs using deep neuralnetworks [51] S9 Classiﬁcation of bacterial and viral childhoodpneumonia using deep learning in chest radio-graphy [52]

S10

Classiﬁcation of images of childhood pneu-monia using convolutional neural networks [44]

S11

Classiﬁcation of pneumonia from X-ray im-ages using siamese convolutional network [53]

S12

Deep Learning Method for Automated Classi-ﬁcation of Anteroposterior and PosteroanteriorChest Radiographs [54]

S13

Deep learning to automate Brasﬁeld chestradiographic scoring for cystic ﬁbrosis [55]

TABLE VII: (continued)

ID TITLE Reference

S14

Deep learning, reusable and problem-basedarchitectures for detection of consolidation onchest X-ray images [56]

S15

Detecting pneumonia in chest radiographs us-ing convolutional neural networks [57]

S16

Detection of Pediatric Pneumonia from ChestX-Ray Images using CNN and TransferLearning [58]

S17

Discriminant Analysis Deep Neural Networks [59]

S18

Identifying Medical Diagnoses and TreatableDiseases by Image-Based Deep Learning [45]

S19

Learning to Recognize Chest-Xray ImagesFaster and More Efﬁciently Based on Multi-Kernel Depthwise Convolution [60]

S20

LungAIR: An automated technique to predicthospitalization due to LRTI using fused infor-mation [61]

S21

Marginal shape deep learning: Applications topediatric lung ﬁeld segmentation [62]

S22

Pulmonary rontgen classiﬁcation to detectpneumonia disease using convolutional neuralnetworks [63]

S23

Simultaneous Lung Field Detection and Seg-mentation for Pediatric Chest Radiographs [64]

S24

Two-stage deep learning architecture for pneu-monia detection and its diagnosis in chestradiographs [65]

S25

Using deep-learning techniques forpulmonary-thoracic segmentations andimprovement of pneumonia diagnosis inpediatric chest radiographs [66]

S26

Visualizing and explaining deep learning pre-dictions for pneumonia detection in pediatricchest radiographs [67]

ABLE VIII: Summary of standard extraction form

Study FQ1 FQ2 FQ3 FQ4 FQ5 FQ6 FQ7 FQ8 FQ9 FQ10 FQ11 FQ12 FQ13 FQ14 FQ15 FQ16 FQ17 FQ18 FQ19 FQ20 FQ21 FQ22S1 S2 S3 S4 S5 S6 S7 S8 S9 S10

S11

S12

S13

S14

S15

S16

S17

S18

S19

S20

S21

S22

S23

S24

S25

S26

Legend:

FQ1. Year of publication? Year (YYYY)FQ2. Publishing vehicle? A) Magazine, B) Journal, C) EventFQ3. Research type classiﬁcation? A) Solution proposal, B) Evaluation, C) Validation, D) Experience report, F) OpinionFQ4. Research method adopted? A) case study, B) controlled experiment, C) simulation, D) prototypingFQ5. Type of Contribution? A) Algorithm, B) Application, C) Framework, D) Product, E) OthersFQ6. Learning approach? A) unsupervised, B) poorly supervised, C) supervisedFQ7. Did you use public dataset? Y) yes, N) noFQ8. Dataset used (PCXR)? A) Belarus, B) CNHS, C) Guangzhou, D) NIH, E) PrivateFQ9. Type of radiographic view used? L) Lateral, AP) frontal Anterior-Posterior, PA) frontal Posterior-Anterior, F) frontal AP and PA, X) not availableFQ10. Amount of images used (only PCXR)? IntegerFQ11. Origin of the images? R) real, S) syntheticFQ12. Used any additional information? Y) yes, N) noFQ13. What additional information was used ? A) diagnostic labels, B) masks, C) other tests E) social information, F) Clinical report, G) Reference valuesFQ14. Do you perform any preprocessing steps? Y) yes, N) noFQ15. What preprocessing step are used? A) Normalization, B) Resizing, C) Cropping, D) Segmentation, E) Suppression F) ImprovementFQ16. Does it make use of any international standards? Y) yes, N) noFQ17. What international standard are used? A) HIPAA (Health Insurance Portability and Accountability Act) B) HL7, C) HITECH , D) ISO, E) IECFQ18. Did you require authorization from an ethics committee? Y) yes, N) noFQ19. What is the task covered? A) Classiﬁcation, B) Diagnostic, C) Enhancement, D) Segmentation, E) Object recognition, F) Localization, G) Detection, H) Prediction / PrognosticFQ20. What evaluation metrics are used? A) Recall, B) Precision, C) Accuracy, D) Loss, E) Confusion Matrix, F) AUC, G) F1 Score, H) Errors, I) Speed, J) Dice coefﬁcient, K) OthersFQ21. What deep learning architecture is used? A) Autoencoder, B) CNN, C) LSTM, D) Recurrent Neural Network, E) Residual Neural Network, F) Restricted Boltzmann Machines, X) not reportedFQ22. Processing approach? SEQ) sequential, PAR) parallel / GPU, X) not reportedn/a Non-applicable n this extraction phase, two independent reviewers ex-tracted the data using a standardized form, while a third,more experienced reviewer was left to resolve doubts anddisagreements. An important note is that some of the featuresmay appear in several studies; therefore, the totals may notalways correspond to 100 % .The extraction standard form can be seen summarised in theTable VIII. The questions (FQ[i-th]) in this form are intendedto cover the entire research topic. Some of them, like FQ2 , FQ3 , FQ4 , are based on the classiﬁcations brought by [37],while

FQ21 on the scheme of DL architectures classiﬁcationproposed by [68].In Table IX, the relationship between the research questions(RQs) and the extraction form questions (FQs), the latter usedto answer the former. The analysis of each of the researchquestions follows.All primary studies selected for data extraction and syn-thesis date from the last ﬁve years, according to the criteriaadopted in their selection, and were returned only from three ofthe six digital libraries, 24 from Scopus, 2 from IEEE Xplorer,in automatic search.

A. About the Research Questions (RQs)

The answers shown next, in each RQ, represent the synthesisof the results obtained in the extraction by FQs, and far frombeing conclusive answers, they are much more of a strongindication of the possible paths taken in the research topicunder analysis.

RQ1:

What is the state of the art of DL on PCXR imagestasks?

Although the answer to this main question cannot begiven by a single question FQ from our extraction form or oneof the subsequent RQs (detailed in the sequence of this text),it can be understood in the aggregate of these questions.For example, from the extraction performed in the 26studies, all back date 2016, it is clear that this research topicis still of great interest to the scientiﬁc community, and thisinterest lies in a growth curve, the what it is corroborated bytime graphs of Figure 5, built from 26 studies of our drivingphase, and that of Figure 1 automatically generated by theGoogle Trends tool under the terms "deep learning" and "chestx-ray" and already shown at the beginning of this work.The publications are equally divided between journals andevents and most are classiﬁed as proposed solutions, adoptingcontrolled experiments as a research method (FQ2, FQ3 andFQ4 ). This evidence added to the results presented in thesestudies allows us to see a constant process of evolution, withnew limits being established every day, and a search forefﬁciency, effectiveness and safety that allow its adoption inthe hospital and health care environments.The techniques applied to the studies lead us to know thearchitectures of the Convolutional Neural Network (CNN) andthe Residual Neural Network (ResNet) (F21) as the mainactors in image processing applications with a history of highprecision and accuracy. 2016 2017 2018 2019 2020

Fig. 5: Number studies selected for extraction by yearIn relation the tasks about PCXR images, some seem toattract more attention from the scientiﬁc community, such asclassiﬁcation and detection (

FQ19 ), but others have also beenresearched, which is a good indication of their importance inthis research ﬁeld. In addition, the small number of publicPCXR data sets has been an additional challenge, currentlycircumvented with learning transfer and data augmentationtechniques, but not yet explored using Adverse Networkarchitectures (taking our 26 articles).Whereas DL has shown impressive advances in many ﬁelds,in the medical ﬁeld and more speciﬁcally CXR imaging thistechnique is still in its infancy. Many limitations are yet tobe overcome, such as better readability of models that allowthe confrontation with the opinion of medical specialists, theestablishment of international standards and speciﬁc metrics toguide and validate the results of the studies and the transitionof these proposal solutions, that still are in the ﬁeld ofresearch to the application in industry, commerce, and hospitalenvironments.In conclusion, and already noted by [18] and [27] DL, itis an excellent and powerful and ever-expanding technique,which can, for example, combine image analysis and radiologytext reports analysis, which brings incredible possibilities andmakes us believe that very soon may come reality CADsystems that generate automated reports for CXR images.

RQ2:

Which tasks applied to chest radiographs imagingare most addressed by deep learning techniques?

As notedin Figure 6, virtually all tasks were covered in the analyzedstudies, except for object enhancement and recognition tasks.Although the analysis is supported by only 26 studies, it canbe observed that research has a wide range of tasks, with theexception of the task of classiﬁcation with greater attentionand prediction/prognostic with lesser other tasks has receivedsimilar attention.

RQ3:

What are the metrics used for assessment?

It maybe premature to indicate which metrics are best for each taskassociated with PCXR images based on 26 studies alone, butit is clear that for classiﬁcation, metrics such as

F20-C and

F20-F have been more adopted, whereas in segmentation the

F20-K metric has been used most often. These metrics arealready well known in the literature, which reveals that studiesin DL and PCXR are not concerned with the development ofABLE IX: A summary of the relationship between research questions and form questions.

X RQ1 RQ2 RQ3 RQ4 RQ5 RQ6 RQ7 RQ8 RQ9 RQ10 RQ11 RQ12

FQ1 ◦ •

FQ2 ◦ •

FQ3 ◦ •

FQ4 ◦ •

FQ5 ◦ •

FQ6 ◦ •

FQ7 ◦ •

FQ8 ◦ •

FQ9 ◦ •

FQ10 ◦ •

FQ11 ◦ •

FQ12 ◦ •

FQ13 ◦ •

FQ14 ◦ •

FQ15 ◦ •

FQ16 ◦ •

FQ17 ◦ •

FQ18 ◦ •

FQ19 ◦ •

FQ20 ◦ •

FQ21 ◦ •

FQ22 ◦ •

Legend:

RQ. Research Question ◦ It indirectly answers the research question.FQ. Form question • It directly answers the research questionClassiﬁcationDetectionSegmentationDiagnosticPrediction/Prognostic .

6% 5 . . . . Fig. 6: Division of tasks covered in the studiesnew evaluation metrics and are quite comfortable with theiruse.

RQ4:

What are the main data sets used in this researchﬁeld and how are it organized?

Although Figure 7,

FQ7 and

FQ10 show that most studies use public sets with a muchlarger number of images than the private sets mentioned, theyare still few and with a limited number of images, even more,if we compare it to public sets from other domains that havehundreds of thousands of images like the ImageNet and COCO[69] data sets, or from sets from the same domain, like theChest X-ray14 [70] from NIH Clinical Center and PadChest[71] from Hospital San Juan de Alicante. Same CXR has been one of the most widely performeddiagnostic tests in the world and there is an abundant numberof real images (FQ11) , this is not reﬂected in the numberof pediatric image data sets available. Difﬁculties are en-countered, among other things, in the privacy and securityrestrictions of patient records and in the arduous and expensivetask of validating diagnoses and labeling (mainly associatedwith detection and segmentation).In the related 26 papers of this work the largest set was be-ing provided from Guangzhou Women and Children’s MedicalCenter (FQ8-C) with 5856 images which is consequently themost cited. Other public sets have also been used in the studies,as less frequently, such as CNHS data set (FQ8-B) collected atChildren’s National Health System or the subset of pediatricimages from NIH Clinical Center data set (FQ8-D) .The studies

S1,S21, S23 for example that present a smallnumber of images are related speciﬁcally to the segmentationtask, explained by the difﬁculty of creating these types of datasets, as they require a lot of time, experience and care tocorrectly deﬁne the masks that will be used as a reference.Moreover, another observation about these studies is that,despite presenting good results in the evaluated metrics, theybring the worst comparative evaluations in relation to otherstudies in the literature, a fact that, linked to the reducednumber of images, ends up putting their results in suspicion.On the other hand, the studies that present the largest numberf images are also those that bring stronger results andevidence of their contribution, mainly because they make amore detailed comparative evaluation with other studies in theliterature.We can see also in Figure 7 and responses from

FQ9 thefrontal view (AP/PA) is predominant with most in projectionAP while the lateral view was used two studies only (althoughone study the projection-type was not available).

AP PA Lateral Not availablePublicPrivateFig. 7: Projection of PCXR used, the axis ’y’ indicatingnumber of studies.This is explained by the fact that this projection allows abetter evaluation, leaving scapula out of the visual ﬁeld andthe size of heart closest to the actual size also projection ispreferred because it is a standard radiographic technique thisallows accurate and valid comparison between repeated AP/PACXRs [72].Finally, although the number of images cannot be said tobe the determining factor in the quality of the study or inthe applied DL technique, mainly because, as shown in [73],the gain related to the increase in the number of imagesthe images have limitations, this observation can support thegeneralization of the proposed solution and its application inmore cases, reducing overﬁtting.

RQ5:

Did the work have ethics committee authorization?

One important aspect of medical area research is the fact whatit is around restrictions as high-security information, requiresthe anonymity of patients and authorization to use of data.It is, however, important and indeed a requirement of mosthealth journals, that authors of all investigations on humanparticipants state whether the study was approved by an ethicscommittee and how consent was obtained [74].We did not ﬁnd information on authorization from anyethics committee in 11 of the 26 studies analyzed (

FQ18 )and, although this is not enough to state that they did not haveone, it may indicate that the demands in this direction do notreceive as much attention; on the other hand, it is importantto note that only two of these 11 studies (

S20 , S23 ) did notuse public data sets (

FQ7 ).[75] points out that most public data, especially that ofgovernment agencies or government-sponsored research, iscollected under protocols approved by one or more ethicscommittees and therefore additional approval is not required,especially when there is no any identiﬁcation of individuals in that data and since that data is already available to the public.We can, therefore, believe that most research is supported bysome institutional review boards, whether in data collection orsecondary analysis.

RQ6:

What are the neural network architectures used in theworks?

Currently, deep learning is the dominant technology inour daily lives, having an impact in several areas, ranging fromentertainment, business, security to health. This technology,made up of neural networks, uses several (deep) layers ofunits with highly optimized algorithms and architectures. Deeplearning has a large number of architectural models, which canbe differentiated by the number of layers, type of network,method or training algorithm, among others.The choice and adoption of a neural network architecture,however, it must consider another more important aspect, itsapplication domain. Long Term Memory (LSTM), for exam-ple, are commonly applied in the understanding and translationof natural language, gesture recognition and writing; Autoen-coders to reduce dimensionality, adverse networks in resourcelearning and topic modeling, Residual Networks (ResNet) andConvolutional Neural Networks (CNN) for image recognition,just to name a few.Thus, it is expected that the adoption of an architectureconsiders its application domain, to taking advantage of itscharacteristics and obtain better results. Therefore, in our pri-mary studies selected and under analysis, CNNs and ResNetsand their variations are prevalent (FQ21-B and FQ21-E) andthe reasons for this seem obvious, as these architectures havean excellent record of accuracy and precision, are experts inrecognizing and classifying images, are available for free andin different programming languages, and in addition, are theones that have more applications related to the studies of thisresearch.On the other hand, open questions from these studies pointto more architectures that could be used, for unsupervisedtraining, such as Autoencoders and RBM networks, to generaterealistic false data like Adverse Generating Networks (GANs),in order to overcome the limitation of data for training and alsowith LSTM networks for generating reports in order to providegreater clarity and intelligibility to the results, for example.

RQ7:

When and in which vehicle type was the articlespublished?

As mentioned in the introduction to this article,and later in the Section IV, all selected studies start from 2016with the majority of 2019 and 2020 (see Figure 5 and

FQ1 )which strengthens the discourse of the growing interest of thematter addressed in this research agenda.These studies are distributed mainly into two types ofpublications: events such as conferences and symposiums, andjournals that are the majority these publications, only a few inmagazines (FQ2) .Journals are usually related to original research that hasgone through a rigorous process with many rounds of peerexpert review in the ﬁeld. Event articles have a faster andless rigorous review process and favor interaction with inter-arallel Sequential Not reported

PreprocessingNo preprocessingFig. 8: Projection between pre-processing and processingapproach, the axis ’y’ indicating number of studiesnational audiences working in the same ﬁeld, with negotiationsand feedbacks being common. Magazines can bring opinionsfrom authors and may not necessarily be supported by sci-entiﬁc literature, although they should not be overlooked insystematic studies.Articles from journals and events are most commonlychosen and considered as the best sources of citation and,in addition, provide more directions for future research, so thepublications in this research agenda point strongly to this path.

RQ8

What the details of types of data and process appliedon DL technique?

As mentioned in

RQ1 and

RQ6 , CNNs andResNets were the predominant architectures in the studies,being applied mainly to the classiﬁcation and detection tasksand with pneumonia as the main pathology addressed, andfor reasons already mentioned in the introduction to thisarticle. Therefore, considering only these main architectures,all studies used supervised training as a learning techniquewith the application of backpropagation and gradient descentbased algorithms, typical features of these architectures [68].An important observation is that the (FQ22-SEQ) sequentialprocessing approach was mentioned in only four of all studies,which proves that parallel processing approaches are becominga standard approach, mainly due to advances in graphicsprocessing units (GPU) and its processing power. However,even with these advances in processing, most studies did notfail to use preprocessing steps, as we can see in Figure 8,among which the most used ones have been the normalization (FQ15-A) and resizing (FQ15-B) . This is quite in reason ofthe beneﬁts they can bring, as to train networks is a costlytask, and steps that reduce this work are always appreciated.

RQ9:

Which type of contribution results?

As we can see inFigure 9, and answers in (FQ-5) , the vast majority of studiespresented some framework or algorithm as a proposal orcontribution of their work, mainly related the application andadaptation of CNN and ResNet architectures, to some cases, acombination of models of these architectures. In relation to theother studies, both are evaluation studies, S5 is an evaluationof the use of DL automated software by health professionalswith no coding experience or DL and S13 that evaluates thehypothesis of a CNN deep model automate Brasﬁeld score on FrameworkAlgorithmOthers

50% 7 . . Fig. 9: Result proposed in the analyzed studiesCXRs of patients with cystic ﬁbrosis (CF) with performancesimilar to that of a pediatric radiologist.In view of what was presented, we can reach some conclu-sions: A) there is a great interest on the part of researchersto use, improve and develop deep learning models, B) theworks are unanimous in pointing out the data limitationsystematically, rigorously and carefully prepared for ensure thegeneralization of the results to external data and different fromthose used in the construction of the models and C) althoughthe research shows excellent performance in the use of thesemodels, none of them brought a product or application as aresult, which can be interpreted as research still in the stageexperimental.

RQ10

Is there any international standard and is it applied?

Today, the need for multi-system connectivity and electronicdata transfer is growing in the Health Care sector. The ne-cessity for integration of systems and for communication ofinformation in this sector becomes evident when studying thevariety of interested parties, the multitude of application theirimportance. A way to reach this is towards standardizationone-time several requirements are evident through the fact thatmany experiments from a lot of different fonts [76].In image archiving and communication systems (PACS),for example, it is common to use the international standardDICOM R (cid:13) (Digital Imaging and Communications in Medicine)[77], which among other issues deﬁnes rules for the trans-mission, storage and display of image information to ensureinteroperability and integration between devices.In this research agenda, we look for similar patterns, al-though the focus of the studies analyzed is not the generationof images, but the subsequent manipulation of them. There-fore, we consider that standards such as DICOM R (cid:13) shouldalready be established, remembering that 85% of the studiesused a public data set (FQ7) and that only of the studieshave mentioned this pattern in their text.Despite this importance, of the analyzed studies, only thestudies S12 and

S13 cited the use of some standardization.The lack of standardization must be explained by the lack ofstandards in the area or even by the fact that the studies are stillin the academic environment. We believe that when the fol-lowing solutions are applied to viable and industry applicationproducts, they may be appropriate for some standardization.

Q11:

How is the study classiﬁed?

Based on Petersen et al.[28] and Wieringa et al. [78], the classiﬁcation that best deﬁnesthe type of research of the analyzed studies (88,5% of them)is the “Solution Proposal” (FQ3) . This classiﬁcation occursbecause these studies propose new techniques or enhancementsto existing techniques for solving tasks related to PCXRimages (see

FQ19 ). The authors of these studies also discusstheir proposals and compare them with other related studies.

RQ12:

Which the research method adopted?

With respectto the adopted method [28], [78], with the exception of

S14 ,which is also a “Case Study”, all others are “Controlled Exper-iments”, since they are performed in an academic environmentunder speciﬁc conditions, using a well-deﬁned data set thatdoes not it is affected by external factors.V. R

ESEARCH AGENDA

In this research agenda, we carefully analyzed 26 studies[S1-S6] and found some evidence of gaps still present inthe effective application of DL in pediatric x-ray radiologicalimages and in the state of the art of these DL techniques.The existing solutions found in the research, although show-ing promising results, are still in the maturation stage, withresponses that are still fragile to the requirements necessary fortheir clinical application. As a contribution, we have outlineda preliminary, but well-founded, research agenda to close thisgap, containing studies that:1) establish objective metrics for each task that helps re-searchers measure the performance and generalization oftheir solutions, as proposed by the American College ofRadiology or even, that allow calculating the uncertaintyestimates of the networks and their conﬁdence level byphysicans [79];2) establish a set of standards for the creation and sharing ofdatabases that are, for example, similar to the concept ofATM network or to the gold standard diagnosis and thatguarantee security and anonymity, none of the analyzedstudies explicitly mentioned this gap;3) assess the impacts of generating data annotations viacrowd-sourcing, especially for those whose process re-quires a high level of expertise, brings fatigue and is slow,scarce and very expensive;4) evaluate the possible impact of reduced dimensionalityon the accuracy of the models, although the absolutemajority of the 26 studies analyzed perform this, noobservation other than the processing cost is mentionedin this regard.5) investigate architectures DL based on more than just data,for example, models based on a combination of data andphysics [80], which can help with both generalization andinterpretability issues.6) Demonstrate the robustness/fragility of DL architecturesapplied to PCXR against adversary attacks or the pres-ence of external noise; 7) investigate the application of DL in the task of generation,registering or retrieval pediatric chest X-ray images, thosetasks were not even mentioned in the analyzed studies;8) evaluate unsupervised DL models, such as variationalauto-encoders (VAEs) and generative adversary networks(GANs), mainly to deal with unbalanced sets betweenclasses, scarce in number or with unlabeled data;9) demonstrate possible gain or not in the use of speciﬁctraining as opposed to transference learning, still a bigchallenge in PCXR due to the limited number of data setwith annotated images.Finally, this agenda goes beyond the simple quantitative in-vestigation of deep learning techniques applied to PCXR.It focuses on questions whose answers may have importantimplications for adopting or not adopting deep learning stan-dards, strategies and architectures, as well as for lifting theirlimitations and pointing out their opportunities.VI. D

ISCUSSION

Given the main research question RQ1 of this SM, whichis the state of the art of DL solutions applied to PCXRimages, 26 articles carefully selected by a rigorous protocolwere subjected to a thorough analysis and synthesis guidedby an extensive form of extraction. In order to answer thisquestion, our SM also draws lines for a research agenda andtries unpretentiously to answer a provocative question aboutthe maturity of DL in its application to pediatric chest images.In a systematic way, our extraction and synthesis processreaches evidence that conﬁrms several conclusions broughtby secondary studies on this topic. The evidence recoveredby the

RQs and also observed in related studies, such asthose described in Section II, clearly indicate several gaps,challenges and, trends still present in this research topic. Weleave here, therefore, the impression of what we believe to bethe most latent points in this ﬁeld of research.First, although there are numerous metrics for assessmentand measurement, there is as far as we know of no in-ternational reference manual or standard for measuring andevaluating deep learning tasks, especially those associated withpediatric chest X-ray images. A trend may be the adoption ofassessment standards such as those applied to major publicchallenges as the by Kaggle or initiatives like that of theAmerican College of Radiology (ACR) through algorithmreview processes [26].Second, applications in clinical medicine are not presentedin any study analyzed, even in those with expressive results,while only one case study (S14) is reported, or that informationhas been suppressed by the studies or to a greater extent this itcan indicate that the solutions are not sufﬁciently mature andsafe.Third, DL has been showing impressive results in severalﬁelds of research, surpassing, in some cases, human perfor-mance, as in precision agriculture, in autonomous vehicles andin the games industry in greater quantity and in the ﬁeld of themedical image in less quantity, as in optical coherence tomog-raphy (OCT) for diabetic retinopathy detection. Regarding itspplication in PCXR images, the results still seem incipientand with less impact, in our point of view needing more tests,which guarantee greater generalization and robustness of itssolutions.Fourth, our SM is not a list of all the open questions aboutthe application of DL in pediatric chest X-ray images, and asalready pointed out in Section V many directions can still befollowed. Therefore, our discussion only tries to bring out themost latent aspects presented in the 26 studies analyzed andwhich may be of some interest to future researchers.Finally, our question about the maturity of the application ofDL in pediatrics CXR images is provocative and, therefore, wedo not intend here to give a deﬁnitive answer, just to raise somequestions that can help the reader to take his experiments.Why is it believed that the DL is still in childhood inpediatric chest X-ray images, because, although the DL wasalready applied to the analysis of medical images in 1995by Lo et al. [81], only from 2015 did it have a massiveapplication in medical images [18] and more precisely in 2016in PCXR images (S8) , therefore, if we consider only this lastinformation, we could chronologically deﬁne DL as a 4-year-old child. Similar observation to that of LeCun et al. [82]which states that systems that combine DL and reinforcedlearning are still in their childhood although they outperformpassive vision systems in classiﬁcation tasks.Another evidence is the comparison of the application ofDL in other areas, such as games and autonomous vehicles orthe same for other medical images, such as optical coherencetomography (OCT) for the detection of diabetic retinopathy(S18), CT for cancer detection, evaluation skeletal bone agein X-ray images of the hand or automatic identiﬁcation ofevidence of classiﬁcation in MRI of the spine that appearsmuch more mature, just to name a few.As already seen in this article (FQ6) , another point thatmay explain the maturity stage of DL applications in CXRpediatrics is the fact that it is supported by supervised learning,dependent on a large amount of data and participation of pro-fessionals in an expensive and degraded data labeling process.The fact that research has not yet had any ﬁeld of controlledexperiments (FQ4) and clinical products or applications isnot yet presented, with the result partially explained by thefragility of its generalizations and the need to standardizemetrics most used by the test. performance and precision,in addition to the clear greater urgency in interpretability,simplicity and mathematical foundations or even uncertaintyabout their decisions, which enable the debate with doctors andthat ethical questions about responsibility in these matters.However, we cannot fail to highlight that the applicationof DL in pediatrics CXR images has surpassed the state ofthe art in several tasks and has made it possible to achievequite impressive results, equivalent or superior in some casesto those achieved by medical specialists and that until then notechnique knowingly mature had achieved.Other points also pointed to the maturation of the appli-cation of DL in PCXR images, mainly those that refer totools to support professional doctors, such as those related to the screening of suspected cases, suppression of bonestructures, pre-processing processes for archiving and storageimprovement, orientation correction or CXR vision rating.Although some DL architectures such as CNN and ResNetseem to have more PCXR images, and tasks such as classiﬁca-tion, detection, and segmentation have received more attention,this ﬁeld of research is on the rise and has a lot of attentionby the scientiﬁc community or leading to a belief that it is justa matter of time for other DL solutions will also be appliedto PCXR images and other tasks. Our bet is that, soon, wewill see annotation and report generation tools, strong use ofunsupervised learning, and crowdsourcing to deal with datascarcity of labeled data about the unbalanced between classesand work in images with high resolution and dimension.VII. C

ONCLUSIONS

Deep learning is not a new topic, seen for the ﬁrst timemore than three decades ago, about chest radiographs and theirstudy with applied computational techniques, the same can besaid [18]. Hundreds of works, as shown in Figure 4, havebeen published combining these subjects, including severalsecondary studies [34], [83], [33], [18], [36] that providephotographs at different times in the evolution of these topics.To the best of the authors’ knowledge, however, no SM studyhas been carried out in research related to the development ofdeep learning techniques applied to pediatric or non-chest X-ray images. Therefore, a mapping that brings a deep immersionof the details and characteristics of these techniques appliedto PCXR images and that still proposes a research agenda tomeet the gaps and trends of this topic, can add some value tonew researchers.A consolidated of 26 primary studies from 178 selectedstudies is presented with a systematic and complete analysisof a hot topic and of growing interest, while the lack of asimilar study, as far as we know, justiﬁes the design of thisunprecedented research agenda.The detailed protocol presented in SectionIII allows otherresearchers to validate, reproduce and extend this study, thusconstituting a signiﬁcant contribution to this study.We understand that this study, ﬁnding only 26 articles,points to an area that is still evolving, which in fact has takenimportant steps, but that still needs to present more studies toreach greater maturity. In this sense, we believe that it can besaid that the application of DL on PCXR is still in childhoodand we are conﬁdent that this research agenda has contributedto its growth and maturity.A

CKNOWLEDGMENTS

We thank all those who collaborate directly and indirectlyfor the execution of this study, in particular to the deep learningspecialists of the group Deep Learning Brazil who collaboratedin the deﬁnition of several keywords of our search string.

EFERENCES[1] U. Desa, “United nations department of economic and social affairs,population division. world population prospects: The 2019 revision.(medium variant),” in

Technical Report: 2019 Revision of World Popu-lation Prospects . United Nations, 2019.[2] N. Pearce, N. Aït-Khaled, R. Beasley, J. Mallol, U. Keil, E. Mitchell, andC. Robertson, “Worldwide trends in the prevalence of asthma symptoms:phase iii of the international study of asthma and allergies in childhood(isaac),”

Thorax , vol. 62, no. 9, pp. 758–766, 2007.[3] W. H. Organization et al. , “The top 10 causes of death: World healthorganization: Geneva, switzerland, 2018,” , 2018.[4] I. N. Mammas and D. A. Spandidos, “The perspectives and the chal-lenges of paediatric radiology: An interview with dr georgia papaioan-nou, head of the paediatric radiology department at the ‘mitera’children’shospital in athens, greece,”

Experimental and therapeutic medicine ,vol. 18, no. 4, pp. 3238–3242, 2019.[5] H. J. Zar, S. Andronikou, and M. P. Nicol, “Advances in the diagnosisof pneumonia in children,”

Bmj , vol. 358, p. j2739, 2017.[6] P. Garcia-Peña, “Chest in children: what to do and when to do it,”

Pediatric Radiology , vol. 41, no. 1, pp. 64–64, 2011.[7] J. G. Blickman, B. R. Parker, and P. D. Barnes,

Pediatric Radiology:The Requisites E-Book, 3rd Ed.

Amer-ican Journal of Roentgenology , vol. 207, no. 4, pp. 903–911, 2016.[11] R. Arthur, “Interpretation of the paediatric chest x-ray,”

Paediatricrespiratory reviews , vol. 1, no. 1, pp. 41–50, 2000.[12] B. B. Thukral, “Problems and preferences in pediatric imaging,”

TheIndian journal of radiology & imaging , vol. 25, no. 4, p. 359, 2015.[13] E. J. Sobo, “Good communication in pediatric cancer care: A culturally-informed research agenda,”

Journal of pediatric oncology nursing ,vol. 21, no. 3, pp. 150–154, 2004.[14] R. M. Summers, “Deep learning lends a hand to pediatric radiology,”2018.[15] H. Brody, “Medical imaging,”

Nature , vol. 502, pp. S81–S81, 2013.[16] S. K. Zhou, H. Greenspan, and D. Shen,

Deep learning for medicalimage analysis . Academic Press, 2017.[17] C. Chen,

Representing scientiﬁc knowledge: The Role of Uncertainty .Springer, 2019.[18] B. van Ginneken, “Fifty years of computer analysis in chest imaging:rule-based, machine learning, deep learning,”

Radiological physics andtechnology , vol. 10, no. 1, pp. 23–32, 2017.[19] R. Dechter,

Learning while searching in constraint-satisfaction prob-lems . University of California, Computer Science Department, Cogni-tive Systems . . . , 1986.[20] H. Salehinejad, E. Colak, T. Dowdell, J. Barfett, and S. Valaee, “Syn-thesizing chest x-ray pathology for training deep convolutional neuralnetworks,”

IEEE Transactions on Medical Imaging , vol. 38, no. 5, pp.1197–1206, 2019.[21] H. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao,D. Mollura, and R. M. Summers, “Deep convolutional neural networksfor computer-aided detection: Cnn architectures, dataset characteristicsand transfer learning,”

IEEE Transactions on Medical Imaging , vol. 35,no. 5, pp. 1285–1298, 2016.[22] Y. Oh, S. Park, and J. C. Ye, “Deep learning covid-19 features on cxrusing limited training data sets,”

IEEE Transactions on Medical Imaging ,pp. 1–1, 2020.[23] G. Wang, W. Li, M. A. Zuluaga, R. Pratt, P. A. Patel, M. Aertsen,T. Doel, A. L. David, J. Deprest, S. Ourselin, and T. Vercauteren, “In-teractive medical image segmentation using deep learning with image-speciﬁc ﬁne tuning,”

IEEE Transactions on Medical Imaging , vol. 37,no. 7, pp. 1562–1573, 2018.[24] B. H. Menze, A. Jakab, S. Bauer, and J. Kalpathy-Cramer, “Themultimodal brain tumor image segmentation benchmark (brats),”

IEEETransactions on Medical Imaging , vol. 34, no. 10, pp. 1993–2024, 2015. [25] A. Mansoor, J. Cerrolaza, G. Perez, E. Biggs, K. Okada, G. Nino, andM. Linguraru, “A generic approach to lung ﬁeld segmentation from chestradiographs using deep space and shape learning,”

IEEE Transactionson Biomedical Engineering , vol. 67, no. 4, pp. 1206–1220, 2019.[26] M. M. Moore, E. Slonimsky, A. D. Long, R. W. Sze, and R. S. Iyer,“Machine learning concepts, concerns and opportunities for a pediatricradiologist,”

Pediatric radiology , vol. 49, no. 4, pp. 509–516, 2019.[27] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi,M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Sánchez,“A survey on deep learning in medical image analysis,”

Medical imageanalysis , vol. 42, pp. 60–88, 2017.[28] K. Petersen, S. Vakkalanka, and L. Kuzniarz, “Guidelines for conduct-ing systematic mapping studies in software engineering: An update,”

Information and Software Technology , vol. 64, pp. 1–18, 2015.[29] K. Petersen, R. Feldt, S. Mujtaba, and M. Mattsson, “Systematicmapping studies in software engineering.” in

Ease , vol. 8, 2008, pp.68–77.[30] B. A. Kitchenham, D. Budgen, and O. P. Brereton, “The value ofmapping studies-a participant-observer case study.” in

EASE , vol. 10,2010, pp. 25–33.[31] ——, “Using mapping studies as the basis for further research–aparticipant-observer case study,”

Information and Software Technology ,vol. 53, no. 6, pp. 638–651, 2011.[32] K. R. Felizardo, E. Y. Nakagawa, S. C. P. F. Fabbri, and F. C. Ferrari,

Revisão sistemática da literatura em Engenharia de Software: Teoria eprática . Elsevier Brasil, 2017.[33] K. Yasaka and O. Abe, “Deep learning and artiﬁcial intelligence inradiology: Current applications and future directions,”

PLoS medicine ,vol. 15, no. 11, p. e1002707, 2018.[34] S. M. Lee, J. B. Seo, J. Yun, Y.-H. Cho, J. Vogel-Claussen, M. L.Schiebler, W. B. Gefter, E. J. Van Beek, J. M. Goo, K. S. Lee et al. , “Deep learning applications in chest radiography and computedtomography: Current state of the art,”

Journal of thoracic imaging ,vol. 34, no. 2, pp. 75–85, 2019.[35] N. Tajbakhsh, L. Jeyaseelan, Q. Li, J. N. Chiang, Z. Wu,and X. Ding, “Embracing imperfect datasets: A review of deeplearning solutions for medical image segmentation,”

Medical ImageAnalysis

The International Journal of Tuberculosis and LungDisease , vol. 20, no. 9, pp. 1226–1230, 2016.[37] S. Keele et al. , “Guidelines for performing systematic literature reviewsin software engineering,” Technical report, Ver. 2.3 EBSE TechnicalReport. EBSE, Tech. Rep., 2007.[38] S. Fabbri, C. Silva, E. Hernandes, F. Octaviano, A. Di Thommazo,and A. Belgamo, “Improvements in the start tool to better support thesystematic review process,” in

Proceedings of the 20th InternationalConference on Evaluation and Assessment in Software Engineering .ACM, 2016, p. 21.[39] M. Pai, M. McCulloch, J. D. Gorman, N. Pai, W. Enanoria, G. Kennedy,P. Tharyan, and J. J. Colford, “Systematic reviews and meta-analyses: anillustrated, step-by-step guide.”

The National medical journal of India ,vol. 17, no. 2, pp. 86–95, 2004.[40] J. Enríquez, L. Morales-Trujillo, F. Calle-Alonso, F. Domínguez-Mayo,and J. Lucas-Rodríguez, “Recommendation and classiﬁcation systems:A systematic mapping study,”

Scientiﬁc Programming , vol. 2019, 2019.[41] S. Fabbri, C. Silva, E. Hernandes, F. Octaviano, A. Di Thommazo,and A. Belgamo, “Improvements in the start tool to better support thesystematic review process,” in

Proceedings of the 20th InternationalConference on Evaluation and Assessment in Software Engineering .ACM, 2016, p. 21.[42] T. N. Kudo, R. F. Bulcão-Neto, and A. M. Vincenzi, “Requirementpatterns: a tertiary study and a research agenda,”

IET Software , 2019.[43] G. Liang and L. Zheng, “A transfer learning method with deep residualnetwork for pediatric pneumonia diagnosis,”

Computer Methods andPrograms in Biomedicine , vol. 187, 2020.[44] A. Saraiva, N. Fonseca Ferreira, L. De Sousa, J. Costa, N.C., J. Sousa,D. Santos, A. Valente, and S. Soares, “Classiﬁcation of images of child-hood pneumonia using convolutional neural networks,” in

BIOIMAGING2019 - 6th International Conference on Bioimaging, Proceedings; Part of12th International Joint Conference on Biomedical Engineering Systemsand Technologies, BIOSTEC 2019 , 2019, pp. 112–119.45] D. Kermany, M. Goldbaum, W. Cai, C. Valentim, H. Liang, S. Baxter,A. McKeown, G. Yang, X. Wu, F. Yan, J. Dong, M. Prasadha, J. Pei,M. Ting, J. Zhu, C. Li, S. Hewett, J. Dong, I. Ziyar, A. Shi, R. Zhang,L. Zheng, R. Hou, W. Shi, X. Fu, Y. Duan, V. Huu, C. Wen, E. Zhang,C. Zhang, O. Li, X. Wang, M. Singer, X. Sun, J. Xu, A. Tafreshi,M. Lewis, H. Xia, and K. Zhang, “Identifying medical diagnoses andtreatable diseases by image-based deep learning,”

Cell , vol. 172, no. 5,pp. 1122–1131.e9, 2018.[46] V. Chouhan, S. Singh, A. Khamparia, D. Gupta, P. Tiwari, C. Moreira,R. Damaševiˇcius, and V. de Albuquerque, “A novel transfer learningbased approach for pneumonia detection in chest x-ray images,”

AppliedSciences (Switzerland) , vol. 10, no. 2, 2020.[47] O. Stephen, M. Sain, U. Maduh, and D.-U. Jeong, “An efﬁcient deeplearning approach to pneumonia classiﬁcation in healthcare,”

Journal ofHealthcare Engineering , vol. 2019, 2019.[48] L. Faes, S. Wagner, D. Fu, X. Liu, E. Korot, J. Ledsam, T. Back,R. Chopra, N. Pontikos, C. Kern, G. Moraes, M. Schmid, D. Sim,K. Balaskas, L. Bachmann, A. Denniston, and P. Keane, “Automateddeep learning design for medical image classiﬁcation by health-careprofessionals with no coding experience: a feasibility study,”

The LancetDigital Health , vol. 1, no. 5, pp. e232–e242, 2019.[49] R. Siddiqi, “Automated pneumonia diagnosis using a customized sequen-tial convolutional neural network,” in

ACM International ConferenceProceeding Series , 2019, pp. 64–70.[50] X. Yi, S. Adams, P. Babyn, and A. Elnajmi, “Automatic catheter andtube detection in pediatric x-ray images using a scale-recurrent networkand synthetic data,”

Journal of Digital Imaging , vol. 33, no. 1, pp. 181–190, 2020.[51] A. Mansoor, G. Perez, G. Nino, and M. Linguraru, “Automatic tissuecharacterization of air trapping in chest radiographs using deep neuralnetworks,” in

Proceedings of the Annual International Conference of theIEEE Engineering in Medicine and Biology Society, EMBS , vol. 2016-October, 2016, pp. 97–100.[52] X. Gu, L. Pan, H. Liang, and R. Yang, “Classiﬁcation of bacterial andviral childhood pneumonia using deep learning in chest radiography,”in

ACM International Conference Proceeding Series , 2018, pp. 88–93.[53] K. Prayogo, A. Suryadibraya, and J. Young, “Classiﬁcation of pneumo-nia from x-ray images using siamese convolutional network,”

Telkomnika(Telecommunication Computing Electronics and Control) , vol. 18, no. 3,pp. 1302–1309, 2020.[54] T. Kim, P. Yi, J. Wei, J. Shin, G. Hager, F. Hui, H. Sair, and C. Lin,“Deep learning method for automated classiﬁcation of anteroposteriorand posteroanterior chest radiographs,”

Journal of Digital Imaging ,vol. 32, no. 6, pp. 925–930, 2019.[55] E. Zucker, Z. Barnes, M. Lungren, Y. Shpanskaya, J. Seekins, S. Halabi,and D. Larson, “Deep learning to automate brasﬁeld chest radiographicscoring for cystic ﬁbrosis,”

Journal of Cystic Fibrosis , vol. 19, no. 1,pp. 131–138, 2020.[56] H. Behzadi-khormouji, H. Rostami, S. Salehi, T. Derakhshande-Rishehri,M. Masoumi, S. Salemi, A. Keshavarz, A. Gholamrezanezhad, M. As-sadi, and A. Batouli, “Deep learning, reusable and problem-basedarchitectures for detection of consolidation on chest x-ray images,”

Computer Methods and Programs in Biomedicine , vol. 185, 2020.[57] J. Ureta, O. Aran, and J. Rivera, “Detecting pneumonia in chestradiographs using convolutional neural networks,” in

Proceedings ofSPIE - The International Society Optical Engineering , vol. 11433, 2020.[58] G. Labhane, R. Pansare, S. Maheshwari, R. Tiwari, and A. Shukla,“Detection of pediatric pneumonia from chest x-ray images usingcnn and transfer learning,” in , 2020, pp. 85–92.[59] L. Li, M. Doroslovaˇcki, and M. Loew, “Discriminant analysis deepneural networks,” in , 2019.[60] M. Hu, H. Lin, Z. Fan, W. Gao, L. Yang, C. Liu, and Q. Song, “Learningto recognize chest-xray images faster and more efﬁciently based onmulti-kernel depthwise convolution,”

IEEE Access , vol. 8, pp. 37 265–37 274, 2020.[61] A. Mansoor, G. Nino, G. Perez, and M. Linguraru, “Lungair: Anautomated technique to predict hospitalization due to lrti using fusedinformation,” in

Proceedings of SPIE - The International Society forOptical Engineering , vol. 10975, 2018.[62] A. Mansoor, J. Cerrolaza, G. Perez, E. Biggs, G. Nino, and M. Lin-guraru, “Marginal shape deep learning: Applications to pediatric lung ﬁeld segmentation,” in

Progress in Biomedical Optics and Imaging -Proceedings of SPIE , vol. 10133, 2017.[63] Z. Rustam, R. Yuda, H. Alatas, and C. Aroef, “Pulmonary rontgenclassiﬁcation to detect pneumonia disease using convolutional neuralnetworks,”

Telkomnika (Telecommunication Computing Electronics andControl) , vol. 18, no. 3, pp. 1522–1528, 2020.[64] W. Zhang, G. Li, F. Wang, L. E, Y. Yu, L. Lin, and H. Liang,“Simultaneous lung ﬁeld detection and segmentation for pediatric chestradiographs,”

Lecture Notes in Computer Science (including subseriesLecture Notes in Artiﬁcial Intelligence and Lecture Notes in Bioinfor-matics) , vol. 11769 LNCS, pp. 594–602, 2019.[65] B. Narayanan, V. Davuluru, and R. Hardie, “Two-stage deep learningarchitecture for pneumonia detection and its diagnosis in chest radio-graphs,” in

Progress in Biomedical Optics and Imaging - Proceedingsof SPIE , vol. 11318, 2020.[66] E. Longjiang, B. Zhao, Y. Guo, C. Zheng, M. Zhang, J. Lin, Y. Luo,Y. Cai, X. Song, and H. Liang, “Using deep-learning techniques forpulmonary-thoracic segmentations and improvement of pneumonia di-agnosis in pediatric chest radiographs,”

Pediatric Pulmonology , vol. 54,no. 10, pp. 1617–1626, 2019.[67] S. Rajaraman, S. Candemir, G. Thoma, and S. Antani, “Visualizing andexplaining deep learning predictions for pneumonia detection in pediatricchest radiographs,” in

Progress in Biomedical Optics and Imaging -Proceedings of SPIE , vol. 10950, 2019.[68] A. Shrestha and A. Mahmood, “Review of deep learning algorithms andarchitectures,”

IEEE Access , vol. 7, pp. 53 040–53 065, 2019.[69] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects incontext,” in

European conference on computer vision . Springer, 2014,pp. 740–755.[70] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers,“Chestx-ray8: Hospital-scale chest x-ray database and benchmarks onweakly-supervised classiﬁcation and localization of common thoraxdiseases,” in

Proceedings of the IEEE conference on computer visionand pattern recognition , 2017, pp. 2097–2106.[71] A. Bustos, A. Pertusa, J.-M. Salinas, and M. de la Iglesia-Vayá,“Padchest: A large chest x-ray image dataset with multi-label annotatedreports,” arXiv preprint arXiv:1901.07441 , 2019.[72] G. De Lacey, S. Morley, and L. Berman,

The Chest X-Ray: A SurvivalGuide E-Book . Elsevier Health Sciences, 2012.[73] C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonableeffectiveness of data in deep learning era,” in

Proceedings of the IEEEinternational conference on computer vision , 2017, pp. 843–852.[74] S. Schroter, R. Plowman, A. Hutchings, and A. Gonzalez, “Reportingethics committee approval and patient consent by study design in ﬁvegeneral medical journals,”

Journal of medical ethics , vol. 32, no. 12, pp.718–723, 2006.[75] K. H. Jacobsen,

Introduction to health research methods: A practicalguide . Jones & Bartlett Publishers, 2020, p. 166.[76] G. J. De Moor, C. McDonald, and J. N. van Goor,

Progress instandardization in health care informatics

Requirements Engineering , vol. 11, no. 1, pp. 102–107, Mar2006. [Online]. Available: https://doi.org/10.1007/s00766-005-0021-6[79] A. Kendall and Y. Gal, “What uncertainties do we need in bayesiandeep learning for computer vision?” in

Advances in neural informationprocessing systems , 2017, pp. 5574–5584.[80] A. Karpatne, W. Watkins, J. Read, and V. Kumar, “Physics-guided neuralnetworks (pgnn): An application in lake temperature modeling,” arXivpreprint arXiv:1710.11431 , 2017.[81] S.-C. Lo, S.-L. Lou, J.-S. Lin, M. T. Freedman, M. V. Chien, and S. K.Mun, “Artiﬁcial convolution neural network techniques and applicationsfor lung nodule detection,”

IEEE Transactions on Medical Imaging ,vol. 14, no. 4, pp. 711–718, 1995.[82] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature , vol. 521,no. 7553, pp. 436–444, 2015.[83] Y. Feng, H. S. Teh, and Y. Cai, “Deep learning for chest radiology: Areview,”