[PDF] Vulnerability Analysis of 2500 Docker Hub Images

Abstract

The use of container technology has skyrocketed during the last few years, with Docker as the leading container platform. Docker's online repository for publicly available container images, called Docker Hub, hosts over 3.5 million images at the time of writing, making it the world's largest community of container images. We perform an extensive vulnerability analysis of 2500 Docker images. It is of particular interest to perform this type of analysis because the vulnerability landscape is a rapidly changing category, the vulnerability scanners are constantly developed and updated, new vulnerabilities are discovered, and the volume of images on Docker Hub is increasing every day. Our main findings reveal that (1) the number of newly introduced vulnerabilities on Docker Hub is rapidly increasing; (2) certified images are the most vulnerable; (3) official images are the least vulnerable; (4) there is no correlation between the number of vulnerabilities and image features (i.e., number of pulls, number of stars, and days since the last update); (5) the most severe vulnerabilities originate from two of the most popular scripting languages, JavaScript and Python; and (6) Python 2.x packages and jackson-databind packages contain the highest number of severe vulnerabilities. We perceive our study as the most extensive vulnerability analysis published in the open literature in the last couple of years.

Full PDF

VVulnerability Analysis of 2500 Docker Hub Images

Katrine Wist

Dep. of Inf. Sec. and Comm. Techn.Norwegian University of Scienceand Technology (NTNU), [email protected]

Malene Helsem

Dep. of Inf. Sec. and Comm. Techn.Norwegian University of Scienceand Technology (NTNU), [email protected]

Danilo Gligoroski

Dep. of Inf. Sec. and Comm. Techn.Norwegian University of Scienceand Technology (NTNU), [email protected]

Abstract —The use of container technology has skyrocketedduring the last few years, with Docker as the leadingcontainer platform. Docker’s online repository for publiclyavailable container images, called Docker Hub, hosts over3.5 million images at the time of writing, making it theworld’s largest community of container images. We performan extensive vulnerability analysis of 2500 Docker images.It is of particular interest to perform this type of analysisbecause the vulnerability landscape is a rapidly changingcategory, the vulnerability scanners are constantly developedand updated, new vulnerabilities are discovered, and thevolume of images on Docker Hub is increasing every day.Our main ﬁndings reveal that (1) the number of newlyintroduced vulnerabilities on Docker Hub is rapidly increas-ing; (2) certiﬁed images are the most vulnerable; (3) ofﬁcialimages are the least vulnerable; (4) there is no correlationbetween the number of vulnerabilities and image features(i.e., number of pulls, number of stars, and days since thelast update); (5) the most severe vulnerabilities originatefrom two of the most popular scripting languages, JavaScriptand Python; and (6) Python 2.x packages and jackson-databind packages contain the highest number of severevulnerabilities. We perceive our study as the most extensivevulnerability analysis published in the open literature in thelast couple of years.

Index Terms —Container technology, Docker, Virtual Ma-chines, Vulnerabilities

1. Introduction

Container technology has been known for a longtime in Linux systems through Linux Containers (LXC),but it was not commonly used until a decade ago. Theintroduction of Docker in [1] made the popularity ofcontainerization rise exponentially. Container technologyhas revolutionized how software is developed and is seenas a paradigm shift. More concretely, containerizationis considered as a beneﬁcial technique for ContinuousIntegration/Continuous Delivery (CI/CD) pipelines; it isproviding an effective way of organizing microservices; itis making it easy to move an application between differentenvironments; and in general, it is simplifying the wholesystem development life cycle.Software containers got its name from the shippingindustry since the concepts are fundamentally the same. Asoftware container is code wrapped up with all its depen-dencies so that the code can run reliably and seamlessly in any computer environment isolated from other processes.Hence, containers are convenient, lightweight, and fasttechnology to achieve isolation, portability, and scalability.Container technology is replacing virtual machinescontinuously, and the trend is that more companies arechoosing to containerize their applications. Gartner pre-dicts that more than 70% of global companies will havemore than two containerized applications in productionby 2023. This is an increase from less than 20% in2019. With the advent of 5G communication technology,it seems that container technology and particularly Dockeris ﬁnding new venue for application in the domain ofnetwork slicing, network management, orchestration andin 5G testbeds [2].Docker provides a popular registry service for the shar-ing of Docker images, called Docker Hub. It currentlyhosts over 3.5 million container images, and the numberkeeps growing. Images could be uploaded and maintainedby anyone, which creates an innovative environment foranyone to contribute and participate. However, on thedownside, this makes it hard for Docker to ensure thatpackages and applications are up to date to avoid outdatedand vulnerable software.When looking at the security of Docker, two aspectsneed to be considered: the security of the Docker softwareat the host, and the security of the Docker containers.Docker Inc. claims that “

Docker containers are, by de-fault, quite secure; especially if you run your processes asnon-privileged users inside the container. ” [3]. However,it is a simple fact that Docker (the Docker daemon andcontainer processes) runs with root privileges by default,which exposes a huge attack surface [4]. A single vul-nerable container is enough for an adversary to achieveprivilege escalation. Hence, the security of the wholeDocker ecosystem is highly related to the vulnerabilitylandscape in Docker images.

Related work.

One of the ﬁrst to explore the vulner-ability landscape of Docker Hub was BanyanOps [5].In 2015, they published a technical report revealing that36% of ofﬁcial images on Docker Hub contained highpriority vulnerabilities [5]. Further, they discovered thatthis number increases to 40% when community images(or general images as they call it in the report) are an-alyzed. BanyanOps built their own vulnerability scanner

1. Gartner: 3 Critical Mistakes That I&O Leaders Must Avoid WithContainers2. Docker Hub webpage: https://hub.docker.com/ a r X i v : . [ c s . CR ] J un ased on Common Vulnerabilities and Exposures (CVE)-scores, and analyzed all ofﬁcial images ( ≈

75 repositorieswith ≈

960 unique images) and some randomly chosencommunity images. However, at that time, Docker Hubconsisted of just ≈ Our contribution.

This is an extended summary of ourlonger and much more detailed work [8]. We scrutinizedthe vulnerability landscape in Docker Hub images at thebeginning of 2020 within the following framework: • Images on Docker Hub belong in one of the followingfour types: "ofﬁcial", "veriﬁed", "certiﬁed", or "commu-nity"; • We used a quantitative mapping of the Common Vulner-ability Scoring System (CVSS) [9] (which is a numericalscore indicating the severity of the vulnerability in a scalefrom 0.0 to 10.0) into ﬁve qualitative severity rating levels:"critical", "high", "medium", "low", or "none" plus oneadditional level "unknown".For performing the analysis of a signiﬁcant numberof images, we used an open-source vulnerability scannertool and developed our own scripts and tools. All ourdeveloped scripts and tools are available from [8] and from

Image type 2015 [5] 2017 [6] 2019 [7] vuln avg vuln avg vuln avg vuln avg

Ofﬁcial 36% - 80% 75 - 170

46% 70

Community 40% - 80% 180 - 150

68% 150

Veriﬁed - - - - - 150

57% 90

Certiﬁed - - - - - 30

82% 90

TABLE 1: A summary comparison table of results re-ported in 2015 [5], in 2017 [6] in 2019 [7] and in our work(2020). The sub-columns "vuln" contain the percentageof images with at least one high rated vulnerability andthe "avg" sub-columns contain the average number ofvulnerabilities found in each image type.the GitHub repository .Our ﬁndings can be summarized as follows: Themedian value (when omitting the negligible and unknownvulnerabilities) is 26 vulnerabilities per image. Mostof the vulnerabilities were found in the medium severitycategory. Around 17.8% (430 images) do not containany vulnerabilities, and if we are considering negligi-ble and unknown vulnerabilities as no vulnerability, thenumber increase to as many as 21.6% (523 images). As intuitively expected, when considering the average,community images are the most exposed. We found that 8out of the top 10 most vulnerable images are communityimages. However, to our surprise, the certiﬁed imagesare the most vulnerable when considering the medianvalue. They had the most high rated vulnerabilities aswell as the most vulnerabilities rated as low. As many as82% of certiﬁed images contain at least either one high orcritical vulnerability. Ofﬁcial images come out as themost secure image type. Around 45.9% of them containat least one critical or high rated vulnerability. Themedian value of the number of critical vulnerabilities inimages is almost identical for all four image types. Veriﬁed and ofﬁcial images are the most updated, andcommunity and certiﬁed images are the least updated.Approximately 30% of images have not been updatedfor the last 400 days. There is no correlation betweenthe number of vulnerabilities and the evaluated imagefeatures (i.e., the number of pulls, the number of stars,and the last update time). However, the images withmany vulnerabilities generally have few pulls and stars.

Vulnerabilities in the Lodash library and vulnera-bilities in Python packages are the most frequent andmost severe. The top ﬁve most severe vulnerabilities arecoming from two of the most popular scripting languages,JavaScript and Python.

Vulnerabilities related to ex-ecution of code and overﬂow are the most frequentlyfound critical vulnerabilities.

The most vulnerablepackage is the jackson-databind-2.4.0 package,with overwhelming 710 critical vulnerabilities, followedby

Python-2.7.5 with 520 critical vulnerabilities.Last but not least, when put in comparison with thethree previous similar studies [5]–[7], our results are sum-marized in Table 1. Note that some of the cells are emptydue to differences in methodologies and types of imageswhen the studies were performed.

3. https://github.com/katrinewi/Docker-image-analyzing-tools epository type Quantity

Ofﬁcial 160Veriﬁed 250Certiﬁed 51Community 3,064,454

Total

2. Preliminaries

Virtualization is the technique of creating a virtualabstraction of some resources to make multiple instancesrun isolated from each other on the same hardware [10].There are different approaches to achieve virtualization.One approach is using Virtual Machines (VMs). A VM isa virtualization of the hardware at the host. Hence, eachVM has its own kernel, and in order to manage the dif-ferent VMs, a software called hypervisor is required. Thehypervisor emulates the Central Processing Unit (CPU),storage, and Random-Access Memory (RAM), amongothers, for each virtual machine. This allows multiplevirtual machines to run as separate machines on a singlephysical machine.In contrast to VMs, containers virtualize the Operat-ing System (OS) level. Every container running on thesame machine shares the same underlying kernel, whereonly bins, libraries, and other run time components areexecuted exclusively for a single container. In short, acontainer is a standardized unit of software that containsall code and dependencies [11]. Thus, containers requireless memory and achieve a higher level of portability thanVMs. Container technology has simpliﬁed the softwaredevelopment process as the code is portable, and hencewhat is run in the development department will be thesame as what is run in the production department [12].On the Docker Hub, image repositories are dividedinto different categories. Repositories are either privateor public and could further be either ofﬁcial , community or a veriﬁed repository. In addition, repositories could becertiﬁed, which is a subsection of the veriﬁed category.The ofﬁcial repositories are maintained and vetted byDocker. Docker vets the veriﬁed ones that are developedby third-party developers. Besides being veriﬁed, certiﬁedimages are also fulﬁlling some other requirements relatedto quality, support, and best practices [13]. Communityimages could be uploaded and maintained by anyone.The distribution of the image repository types on DockerHub can be seen in Table 2. The community repositorycategory is by far the most dominant one and makes upto ≈

99% of all Docker Hub repositories.

The severity of vulnerabilities depends on a variety ofdifferent variables, and it is highly complex to comparethem due to the diversity of different technologies andsolutions. Already in 1997, the National VulnerabilityDatabase (NVD) started working on a database that wouldcontain publicly known software vulnerabilities to pro-vide a means of understanding future trends and current patterns [14]. The database can be useful in the ﬁeld ofsecurity management when deciding what software is safeto use and for predicting whether or not software containsvulnerabilities that have not yet been discovered.

Common Vulnerabilities and Exposures (CVE).

Na-tional Vulnerability Database (NVD) contains CommonVulnerabilities and Exposures (CVE) entries and pro-vides details about each vulnerability like vulnerabilityoverview, Common Vulnerability Scoring System (CVSS),references, Common Platform Enumeration (CPE) andCommon Weakness Enumeration (CWE) [15].CVE is widely used as a method for referencingsecurity vulnerabilities that are publicly known in releasedsoftware packages. At the time of writing, there were130,094 entries in the CVE list. The CVE list wascreated by MITRE Corporation in 1999, whose role isto manage and maintain the list. They work as a neutraland unbiased part in order to serve in the interest ofthe public. Examples of vulnerabilities found in CVE arefrequent errors, faults, ﬂaws, and loopholes that can beexploited by a malicious user in order to get unauthorizedaccess to a system or server. The loopholes can also beused as propagation channels for viruses and worms thatcontain malicious software [16]. Over the years, CVE hasbecome a recognized building block for various vulnera-bility analysis and security information exchange systems,much because it is continuously maintained and updated,and because the information is stored with accurate enu-meration and orderly naming.Figure 1: Common Vulnerability Scoring System structure[9] Common Vulnerability Scoring System (CVSS).

TheCommon Vulnerability Scoring System (CVSS) score is anumerical score indicating the severity of the vulnerabilityon a scale from zero to 10, based on a variety of metrics.The metrics are divided into three metric groups: BaseMetric Group, Temporal Metric Group, and Environmen-tal Metric group. A

Base Score is calculated by the metricsin the Base Metric Group, and is independent of the userenvironment and does not change over time. The TemporalMetrics take in the base score and adjusts it according tofactors that do change over time, such as the availability ofexploit code [9]. Environmental Metrics adjust the score

4. The number of entries in the CVE list was retrieved 28. Jan 2020from the ofﬁcial website: https://cve.mitre.org

5. MITRE Corporation is a non-proﬁt US organization with the visionto resolve problems for a safer world: ating CVSS Score

None 0.0Low 0.1 - 3.9Medium 4.0 - 6.9High 7.0 - 8.9Critical 9.0 - 10.0TABLE 3: CVSS Severity scoresyet again, based on the type of computing environment.This allows organizations to adjust the score related totheir IT assets, taking into account existing mitigationsand security measures that are already in place in theorganization.In our analysis, it would not make sense to take intoaccount the Temporal or Environmental Metrics as wewanted to discuss the vulnerability landscape indepen-dently of the exact time and environment. Therefore, onlythe Base Metric group will be described in more detail.It is composed of two sets of metrics: the Exploitabilitymetrics and the Impact metrics, as can be seen in Figure1 [9]. The ﬁrst set takes into account how the vulnerablecomponent can be exploited and includes attack vectorand complexity, what privileges are required to performthe attack, and whether or no user interaction is required.The latter set reﬂects on the consequence of a successfulexploit and what impact it has on the conﬁdentiality,integrity, and availability of the system. The last metric is scope , which considers if the vulnerability can propagateoutside the current security scope.When the Base Score of a vulnerability is calculated,the eight different metrics from Figure 1 are being con-sidered. Each metric is assigned one out of two to fourdifferent values, which is used to generate a vector string.The vector string is then used to calculate the CommonVulnerability Scoring System (CVSS) score, which is anumerical value between 0 and 10. In many cases, it ismore beneﬁcial to have a textual value than a numericalvalue. The CVSS score can be mapped to qualitativeratings where the severity is categorized as either critical,high, medium, low, or none, as can be seen in Table 3 [9].

3. Docker Hub vulnerability landscape

To determine what the current vulnerability landscapeis like in Docker Hub, the number of vulnerabilities foundin each severity category is presented in ﬁgure 2. As it isinteresting to see how many vulnerabilities that are foundin total (ﬁgure 2a) and how many unique vulnerabilities(ﬁgure 2b) there are, both these results are presented inthis section.In ﬁgure 2a, the results are based on vulnerabilityscanning of the complete data set, meaning that thisresult is based on all found vulnerabilities. The samevulnerability could potentially have multiple entries in theresult. This is because a particular vulnerability could befound in multiple images and a single image could containthe same vulnerability in multiple packages. In ﬁgure 2b,only unique vulnerabilities are shown. However, somevulnerabilities are present in several severity categories, depending on which image it is found in. In cases likethis, all versions of the vulnerability is included, whichmakes up a total of 14,031 vulnerabilities.In ﬁgure 2a, the negligible and unknown categoriesclearly stands out, with a total of 315,102 and 240,132 vul-nerabilities, respectively. When considering unique vul-nerabilities (ﬁgure 2b), the medium category is the mostdominant one with 5,554 unique vulnerabilities. Whenexamining the relation between ﬁgure 2a and 2b, onecan observe the ratio of vulnerabilities between severitycategories. It becomes clear that the negligible categorycontains a few number of unique vulnerabilities repre-sented in many Docker images. Whereas the mediumcategory has many unique vulnerabilities represented ata lower ratio. The vulnerability ratio will be explained indetail in the next paragraph.

Severity Number ofvulnerabilities (A) Number of uniquevulnerabilities (B) Ratio(A/B)Critical 10,378 206 50High 44,058 1,313 34Medium 171,832 5,554 31Low 137,290 2,326 59Negligible 315,102 959 329Unknown 240,132 3,674 65

Total

TABLE 4: Vulnerability frequency in severity levelsTable 4 shows the total number of vulnerabilities, thenumber of unique vulnerabilities, and the ratio, measuredas the total number of vulnerabilities divided by thenumber of unique vulnerabilities. So, for each uniquevulnerability, there are a certain number of occurrencesof the speciﬁc vulnerability in the data set. For example,for each unique vulnerability in the critical category, thereare 50 occurrences of this vulnerability in the data set onaverage. For each unique negligible vulnerability, thereare as many as 329 occurrences on average. This issigniﬁcantly larger than the other values. Despite mediumhaving the highest number of unique vulnerabilities, it hasthe lowest ratio.

We have looked at the average and median values ofthe number of vulnerabilities in images when disregardingthe vulnerabilities that are categorized as negligible andunknown. Looking at Table 4 from the previous section,one can see that negligible and unknown vulnerabilitiestogether make up 555,234 out of the 918,792 vulner-abilities (around 60%). As vulnerabilities in these twocategories are considered to contribute with little threatwhen investigating the current vulnerability landscape, itgives a more accurate result to exclude these. Therefore,we calculated the average and median number of vul-nerabilities in images when disregarding negligible andunknown vulnerabilities (counting them as zero). Theresult was 151 for the average and 26 for the median.To investigate the data when disregarding the negligi-ble and unknown vulnerabilities further, we created Table5 that shows statistical values of number of vulnerabilitiesfor each image type. The results show that communityimages have the highest average and maximum values a) Distribution of all 918,792 vulnerabilities (b) Distribution of 14,032 unique vulnerabilities

Figure 2: Vulnerability distribution in severity levels

Image type Numberofanalyzedimages Numberofvulnera-bilities Average Median MaxVeriﬁed 60 6,073 101.2 13 1,128Certiﬁed 22 1,987 90.3 37 428Ofﬁcial 157 11,489 73.2 9 1,615Community 2,173 344,009 158.3 28 6,509

TABLE 5: Statistical values for vulnerabilities per imagetype, disregarding negligible and unknown vulnerabilities.(158, and 6,509, respectively). The maximum value forcommunity images is signiﬁcantly larger than the averageand the median, which is the case for the other threeimage types as well. The image type that is considered asthe least vulnerable is ofﬁcial. It has the lowest averageof 73 and the lowest median value of 9. Further, themaximum value for ofﬁcial images is the second lowest.The lowest maximum value belongs to certiﬁed, and isonly 428. Although certiﬁed has the lowest maximumvalue, it has the highest median value. This indicates thata larger portion of the images have many vulnerabilities.As a ﬁnal note, all four image types contain at least oneimage with zero vulnerabilities.

Since the median describes the central tendency betterthan the average when the data is skewed here we willwork with the median values (given in Figure 3). Notethat only critical, high, medium and low vulnerabilitiesare included in the ﬁgure. The negligible and unknownvulnerabilities are not included here because they do notusually pose as signiﬁcant threats, and therefore do notcontribute with additional information when investigatingthe current vulnerability landscape.The results show that the median of critical vulnera-bilities is almost the same for all four image types (4.0and 3.0). The other severity categories are more variedacross the image types. The high severity category is themost represented in certiﬁed images, while the medium Figure 3: Median values of vulnerabilities for each sever-ity category and image typecategory is the most represented in the community images.For veriﬁed, ofﬁcial and community images, the mediumseverity has the highest median, while the certiﬁed imageshas the most low vulnerabilities. Overall, it is the certiﬁedimages that are the most vulnerable.

Out of all 2,412 successfully analyzed images, thissection will present the most vulnerable ones. Table 6displays the most vulnerable images based on the numberof critical vulnerabilities in each image. In cases wherethe critical count is the same, the image with the highestnumber of high rated vulnerabilities is considered as themost vulnerable one. The

Number of pulls column denotesthe total number of pulls (downloads) for each image.Out of the top 10 most vulnerable images, there are8 community images, 1 ofﬁcial image (silverpeas) and1 veriﬁed image (microsoft-mmlspark-release). There arebig variations in the number of vulnerabilities in allpresented severity levels. The most vulnerable image, mage Critical High Medium Low Number of pulls

TABLE 6: The most vulnerable images sorted by critical countFigure 4: The percentage of images that contain at leastone high or critical rated vulnerability. pivotaldata/gpdb-pxf-dev , has ~250 more critical vulnera-bilities than the second most vulnerable image. However,the second most vulnerable image, cloudera/quickstart ,contains as many as 2,155 high rated vulnerabilities,which is ~1500 more vulnerabilities than the one rated asthe most vulnerable image. It was chosen to focus on thecritical vulnerabilities in the ranking of the most vulnera-ble images. This is because it is the highest possible rank-ing and hence the most severe vulnerabilities will be foundin this category. The other severity categories are includedin the table as extra information and to give a clear viewon the distribution of vulnerabilities. From the number ofpulls column one can observe that the most vulnerableimage is also the most downloaded one out of the top 10,with as many as 139,246,839 pulls. This is approximately100 million more pulls compared to the second mostpulled image on this list (the raphacps/simpsons-maven-repo image). There is no immediate correlation that couldbe observed between the number of pulls and the numberof vulnerabilities in these images.

It is enough with a single vulnerability for a systemto be compromised. Thus, we determine what percentageof images that contain at least one high or critical ratedvulnerability for each image type, as shown in Figure 4.Our results (Figure 4) reveal that the certiﬁed imagetype, which is a subsection of the veriﬁed image type, isthe most vulnerable by the means of this measure. 81.8%of all certiﬁed images contain at least one vulnerabilitywith high severity level and 72.7% of them contain at least one critical vulnerability. Community images come out asthe second most vulnerable image type. 67.4% have highvulnerabilities and 45.1% have critical vulnerabilities. Thethird most vulnerable image type is veriﬁed, followed byofﬁcial.When combining these results, to investigate whatamount of the image types that contain either at leastone critical or high rated vulnerability, the results are asfollows: 81.8% for certiﬁed images, 68.4% for communityimages, 56.7% for veriﬁed images and 45.9% for ofﬁcialimages. This makes the ofﬁcial images the least vulnerableimage type. However, it should be emphasized that stillalmost half of the ofﬁcial images contain critical or highrated vulnerabilities as presented in this section.

This section will focus on the trend of all reportedCommon Vulnerabilities and Exposures (CVE) vulnera-bilities each year compared to the number of unique CVEvulnerabilities found throughout our analysis. Data gath-ered from the CVE Details database [17] is used to displaythe number of new reported Common Vulnerabilities andExposures (CVE) vulnerabilities each year.In Figure 5a the reported Common Vulnerabilities andExposures (CVE) vulnerabilities each year is presentedtogether with the unique CVE vulnerabilities found in ouranalysis from 2010 to 2019. The orange line shows howthe number of new discovered CVE vulnerabilities variesby a few thousand vulnerabilities each year. However,there is a signiﬁcant increase in 2017. This increase is notreﬂected in the data from our analysis, which is followinga steady increase in the years from 2014 to 2017. Thisincrease can be explained by the introduction of DockerHub in 2014, making new vulnerabilities more representedin images. As a ﬁnal observation, the number of newreported vulnerabilities from MITRE between 2018 and2019 is decreasing, while there is an increase in ourresults.Figure 5b shows the number of unique vulnerabilitiesfound in each image type (i.e. community, ofﬁcial, veriﬁedand certiﬁed) in our analysis from 2010 to 2019. Thisﬁgure gives an insight in how the overall changes arereﬂected in each image type. Veriﬁed and certiﬁed imageshave had an increase in the number of unique CommonVulnerabilities and Exposures (CVE) vulnerabilities eachyear from 2015. Community and ofﬁcial images, however,have had a signiﬁcant decrease of unique vulnerabilitiesfrom 2017 to 2018. It is noteworthy to point out that a) (b)

Figure 5: CVE trend from 2010 to 2019, (a) displays all reported CVEs and all found, unique CVEs in our analysis,(b) displays the CVEs in the different image types from our analysis.the curves are affected by the time of introduction of thedifferent image types. Ofﬁcial images were introduced in2014, whereas veriﬁed and certiﬁed images were intro-duced in 2018.

There is a high variation in how often Docker Hubimages are updated. Intuitively, this affects the vulnera-bility landscape of Docker Hub. Hence, we have gathereddata about when images were last updated, and calculatedthe number of days since the images were last updated,counting back from February 25th, 2020. The data setconsists of last updated data for all analyzed images,except ﬁve.A brief analysis of the numbers from our databaserevealed that 31.4% of images have not been updated in400 days or longer and 43.8% have not been updated in200 days or longer. The percentage of images that havebeen updated during the last 14 days are 29.8%. Thisimplies that if these numbers are representative for allimages on Docker Hub, a third of the images (31.4%) onDocker Hub have not been updated in the last 400 daysor longer.To go into more detail, Table 7 presents how oftenimages in each of the image types are updated. Com-munity and certiﬁed images are the least updated imagecategories, where 47.0% of community images and 36.4%of certiﬁed images have not been updated for the last200 days or more. The veriﬁed images are the mostfrequently updated category, where 83.3% of images havebeen updated during the last 14 days.A handful of certiﬁed images are highly affecting thepercentages from Table 7, because the overall numberof certiﬁed images is small. Ofﬁcial images contain ahigh portion of images that have been updated recently(January 2020 to March 2020), and some more spreadvalues with images that have not been updated since 2016.The veriﬁed images are the most updated image type,where there is only one image with the last updated timeearlier than May 2019.

Image type More than400 days More than200 days Less than14 daysCommunity 33.9% 47.0% 27.0%Ofﬁcial 9.6% 14.7% 51.3%Certiﬁed 18.2% 36.4% 13.6%Veriﬁed 1.7% 5.0% 83.3%

TABLE 7: The time since last update for all image typespresented in percentage

4. Correlation between image features andvulnerabilities

We investigate whether or not the number of vulnera-bilities in an image is affected by a speciﬁc image feature,such as the number of times the image has been pulled, thenumber of stars an image has been given, or the numberof days since the image was last updated. In order to ﬁndout whether there is a correlation, we used Spearman’s r s correlation coefﬁcient [18]. Spearman’s correlation waschosen because our data set contain skewed values andare not normally distributed. When handling entries thatcontained empty values, we opted for the approach ofcomplete case analysis, which means omitting incompletepairs. The alternative would be imputation of missingvalues, which means to create an estimated value basedon the other data values. However, this approach was notchosen because the values of our data set are independentof each other. Correlation between pulls and vulnerabilities.

To checkthe folklore wisdom about the following correlation: im-ages with the most pulls generally have few vulnerabil-ities, and images with the most vulnerabilities generallyhave few pulls , we created a scatter plot given in Figure6. However, after calculating the Spearman correlationcoefﬁcient between the number of pulls and number ofvulnerabilities for the whole set of investigated imageswe got r s = − . . This is considered as no particularcorrelation. To explain this, we refer to the meaning ofigure 6: Number of pulls and number of vulnerabilitiesfor each imagehaving a high negative correlation: the markers wouldgather around a decreasing line (not necessarily linear),indicating that images with more pulls have less numberof vulnerabilities. In the case of high positive correlation,the opposite would apply i.e. the line would be increasing. Correlation between stars and vulnerabilities.

Thecorrelation coefﬁcient between the number of stars andnumber of vulnerabilities is r s = − . . Figure 7shows the scatter plot when including number of starsinstead of number of pulls. The plot is similar to Figure6, but the correlation is even weaker.Figure 7: Number of stars and number of vulnerabilitiesfor each image Correlation between time since last update and vulner-abilities.

This correlation is calculated by computing thenumber of days since the last update counting from theday we gathered the data (which was February 25, 2020).The correlation was r s = 0 . , which shows a positivecorrelation as opposed to the other two. Figure 8 showsthe scatter plot, and although the markers are approachingan increasing line a tiny bit, this is minimal. The value of0.1075 is still not enough to state that there is a strongcorrelation between the number of vulnerabilities and time Figure 8: Number of days since last update and numberof vulnerabilities for each imagesince the last update. The markers slightly approach anincreasing line, indicating a weak tendency that thereare more vulnerabilities in images that have not beenupdated for a long time. Still, the distribution of markersis relatively even along the x-axis with the most markersin the lower part of the y-axis, supporting that there is nocorrelation.

5. The most severe vulnerabilities

The most represented severe vulnerabilities are, intu-itively, the ones having the highest impact on the vulner-ability landscape. Table 8 presents the most representedcritical rated vulnerabilities in descending order. The re-sults are obtained by counting the number of occurrencesfor each vulnerability ID in the critical severity level. Thecritical count column is the number of occurrences for aspeciﬁc vulnerability. Lastly, the type(s) column presentsthe vulnerability type of each of the vulnerabilities. Thisdata is gathered from the CVE Details database [19].

VulnerabilityID Criticalcount Type(s)

TABLE 8: The most represented vulnerabilities (based oncritical severity level). .2. Vulnerability characteristics

We elaborate the top ﬁve most represented vulnerabil-ities presented in Table 8 regarding their characteristicsand common features . The top ﬁve severe vulnerabili-ties are coming from two most popular script languages:JavaScript and Python. As a general observation, the exe-cute code is the most common vulnerability type, followedby overﬂow.The most represented critical vulnerability is found466 times throughout our scanning. It has vulnerabilityID CVE-2019-10744 , and a base score of 9.8, which isin the upper range of the critical category (to examinehow base scores are determined, see Section 2.1). Thevulnerability is related to the JavaScript library lodash,which is commonly used as a utility function providerin relation to functional programming. This particularvulnerability is related to improper input validation andmakes the software vulnerable to prototype pollution. Itis affecting versions of lodash lower than 4.17.12 [20]. Inshort, this means that it is possible for an adversary toexecute arbitrary code by modifying the properties of theObject.prototype. This is possible as most JavaScript ob-jects inherit the properties of the built in Object.prototypeobject. The ﬁfth vulnerability on the list,

CVE-2018-16487 , is also related to lodash and the prototype pollutionvulnerability.Further, the second, third and fourth most representedcritical vulnerabilities are related to Python vulnerabilities.The second vulnerability with vulnerability ID,

CVE-2017-1000158 , is related to versions of Python up to2.7.13. The base score is rated 9.8, and the vulnerabilityenables arbitrary code execution to happen through aninteger overﬂow leading to a heap-based buffer overﬂow[21]. Overﬂow vulnerabilities could be of different types,for instance heap overﬂow, stack overﬂow and integeroverﬂow. Heap overﬂow and stack overﬂow are relatedto overﬂowing a buffer, whereas integer overﬂow couldlead to a buffer overﬂow. A buffer overﬂow is relatedto overwriting a certain allocated buffer, causing adja-cent memory locations to be overwritten. Any exploitof these kinds of vulnerabilities are typically related tothe execution of arbitrary code, where the adversary istaking advantage of the buffer overﬂow vulnerability torun malicious code.The third presented vulnerability with vulnerability ID

CVE-2019-9948 is affecting the Python module urllib inPython version 2.x up to 2.7.16. It is rated with 9.1 as basescore. This vulnerability makes is easier to get around se-curity mechanisms that blacklist the file:URIs syntax,which in turn could give an adversary access to local ﬁlessuch as the /etc/passwd ﬁle [22]. The fourth vulnerabilityis found 374 times and has vulnerability ID

CVE-2019-9636 . It is affecting both the second and the third versionof Python (versions 2.7.x up to 2.7.16, and 3.x up to 3.7.2).This vulnerability is also related to the urllib module,more precisely, incorrect handling of unicode encoding.The result is that information could be sent to differenthosts than intended if it was parsed correctly [23]. It hasa base score of 9.8.

6. Information about all vulnerabilities could be found by visitinghttps://nvd.nist.gov/vuln/detail/

Package Critical count Image count

TABLE 9: The most vulnerable packages (based on crit-ical severity level).

6. Vulnerabilities in packages

Table 9 presents the packages that contain the mostcritical vulnerabilities. The critical count column is ob-tained by counting the total number of occurrences ofcritical vulnerabilities in each package, while the imagecount column is the number of images that uses eachpackage.There is a clear relation between the most vulnerablepackages and the most represented vulnerabilities (Section5), as expected. For example, vulnerabilities found inPython version 2.x packages and in the Lodash packageare both presented in Section 5.From Table 9, one can observe that the Python pack-ages are by far the most used packages, and therefore theyexpose the biggest impact regarding the threat landscape.The lodash-3.10.1 package is found in 76 images. Thispackage contains the prototype pollution vulnerability af-fecting JavaScript code, which also is the most representedvulnerability in Table 8. Further, the jackson-databindpackage is represented with four different versions inTable 9 (entry 1, 3, 8 and 9). This package is used totransform JSON objects to Java objects (Lists, Numbers,Strings, Booleans, etc.), and vice versa. In total, thesepackages are used by 44 images: a relatively low amountcompared to the usage of the Python packages. Finally,the silverpeas-6.0.2 package contains 280 critical vulner-abilities and is only used by a single image: the silverpeasimage on Docker Hub. When considering the packages that have the mostcritical vulnerabilities (Table 9), some of the packages areonly used by a few images (like the silverpeas package).Therefore, Table 10 is presented, as it is desirable tosee what the vulnerability distribution is like in the mostpopular packages. The table shows the most used packagesand the number of vulnerabilities that are present in them,considering all security levels. The image count columncontain the number of images that use this package.As observable from Table 10, the most used packagesare not containing any critical, high, medium or lowvulnerabilities (except for one entry). However, they arecontaining a vast number of negligible vulnerabilities,which is of less signiﬁcance from a security point of view,as mentioned in previous sections. https://hub.docker.com/_/silverpeas ackage Critical High Medium Low Negligible Unknown Imagecount TABLE 10: Vulnerabilities in the most used packages.

7. Conclusions and future work

This paper summarizes the ﬁndings that we reportedin a longer and much more detailed work [8]. We studiedthe vulnerability landscape in Docker Hub images byanalyzing 2500 Docker images of the four image repos-itory categories: ofﬁcial, veriﬁed, certiﬁed images, andcommunity. We found that as many as 82% of certiﬁedimages contain at least one high or critical vulnerability,and that they are the most vulnerable when consideringthe median value. Ofﬁcial images came out as the mostsecure image type with 45.9% of them containing at leastone critical or high rated vulnerability. Only 17.8% of theimages did not contain any vulnerabilities, and we foundthat the community images are the most exposed as 8out of the top 10 most vulnerable images are communityimages.Concerning the technical speciﬁcs about the vulnera-bilities, we found that the top ﬁve most severe vulnerabil-ities are coming from two of the most popular scriptinglanguages, JavaScript and Python. Vulnerabilities in theLodash library and vulnerabilities in Python packagesare the most frequent and most severe. Furthermore, thevulnerabilities related to execution of code and overﬂoware the most frequently found critical vulnerabilities. Ourscripts and tools are available from [8] and from theGitHub repository.For the future work we ﬁrst propose two improvementsthat are beyond our control, and are mostly connected withthe maintenance of all 3.5 million images at the DockerHub web site: 1. There is a need for a complete and well-documented endpoint for image data gathering;, and 2.There is a need for improvements on the Docker Hubweb pages to make it possible to access all images throughnavigation.Concerning improvements of this work, we consider afuture analysis that will run over a more extended period.All previous studies conducted in this ﬁeld, as well asours, have only analyzed vulnerabilities in Docker Hubimages captured from one single data gathering. Thus,changes in the data set over time are still not investigated.This type of analysis could reveal more in-depth detailsabout the characteristics and evolution of the vulnerabilitylandscape.Lastly, we suggest future work to be targeting thefalse positives and false negatives in container scannersby integrating machine learning into container scanners.

References [1] A. Avram, “Docker: Automated and consistent software deploy-ments,”

InfoQ. Retrieved , pp. 08–09, 2013.[2] A. Esmaeily, K. Kralevska, and D. Gligoroski, “A Cloud-basedSDN/NFV Testbed for End-to-End Network Slicing in 4G/5G,” arXiv preprint arXiv:2004.10455