A Historical and Statistical Studyof the Software Vulnerability Landscape
AA Historical and Statistical Studyof the Software Vulnerability Landscape
Assane Gueye
CMU-Africa
Kigali, [email protected]
Peter Mell
National Instituteof Standards and Technology
Gaithersburg MD, [email protected]
Abstract —Understanding the landscape of software vulnerabil-ities is key for developing effective security solutions. Fortunately,the evaluation of vulnerability databases that use a framework forcommunicating vulnerability attributes and their severity scores,such as the Common Vulnerability Scoring System (CVSS), canhelp shed light on the nature of publicly published vulnerabilities.In this paper, we characterize the software vulnerability land-scape by performing a historical and statistical analysis of CVSSvulnerability metrics over the period of 2005 to 2019 throughusing data from the National Vulnerability Database. We conductthree studies analyzing the following: the distribution of CVSSscores (both empirical and theoretical), the distribution of CVSSmetric values and how vulnerability characteristics change overtime, and the relative rankings of the most frequent metric valueover time. Our resulting analysis shows that the vulnerabilitythreat landscape has been dominated by only a few vulnerabilitytypes and has changed little during the time period of the study.The overwhelming majority of vulnerabilities are exploitableover the network. The complexity to successfully exploit thesevulnerabilities is dominantly low; very little authentication tothe target victim is necessary for a successful attack. And mostof the flaws require very limited interaction with users. Howeveron the positive side, the damage of these vulnerabilities is mostlyconfined within the security scope of the impacted components.A discussion of lessons that could be learned from this analysisis presented.
Index Terms —Vulnerabilities, Statistics
I. I
NTRODUCTION
Understanding the landscape of software vulnerabilities isa key step for developing effective security solutions. It isdifficult to counter a threat that is not well understood. Fortu-nately, there exist vulnerability databases that can be analyzedto help shed light on the nature of publicly published softwarevulnerabilities. The National Vulnerability Database (NVD)[1] is one such repository. NVD catalogs publicly disclosedvulnerabilities and provides an analysis of their attributesand severity scores using the Common Vulnerability ScoringSystem (CVSS) [2]. CVSS is used extensively by securitytools and databases and is maintained by the internationalForum of Incident Response and Security Teams (FIRST) [3].The CVSS provides a framework for describing vulnera-bility attributes and then scoring them as to their projectedseverity. The attributes are metric values that are the input toa CVSS equation that generates the score. It is the vulnerabilityattribute descriptions (the metric values) that are of primary interest to our work, although we also look at the raw scores.The use of CVSS by vulnerability databases provides a suiteof low level metrics, encapsulated in a vector, describingthe characteristics of each vulnerability. CVSS was initiallyreleased in 2005 [4], was completely revamped with version 2(v2) in 2007 [5], and was updated with new and modifiedmetrics in 2015 with the release of version 3 (v3) [6] .The software flaw vulnerability landscape was thoroughlyanalyzed in the scientific literature using v2 when it was firstreleased [4], [8]–[13], but little work has been done since toevaluate changes to that landscape over time. Also in ourliterature survey, we did not find a single study that usesthe updated and significantly modified v3 to understand thesoftware vulnerability landscape.In this paper, we use the CVSS v2 and v3 data provided bythe NVD to undertake a historical and statistical analysis ofthe software vulnerabilities landscape over the period 2005 to2019. More precisely, we conduct three studies analyzing thefollowing: • score distributions, • metric value distributions, • and relative rankings of the most frequent metric values.For our first study, we analyze and compare the distributionsof CVSS v2 and v3 scores as generated from the NVDdata. We then compare the empirical distributions against thetheoretical score distributions, assuming that all CVSS vectorsare equally likely (which is not the case, but it is illustrativeto evaluate the differences).For our second study, we compute the distributions of theCVSS metric values (i.e., vulnerability characteristics) for eachyear. We then analyze the differences from 2005 to 2019 todetermine if and how vulnerability characteristics change overtime.For our third study, we identify the most frequent metricvalues and analyze their relative rankings from 2015 to 2019.For each year and for both CVSS versions, we compute thevalues of the top 10 observed vulnerability metrics as well astheir frequencies. We then generate parallel coordinates plots Minor update version 3.1 was released in 2019 [7] but the changes thereindo not effect our work. a r X i v : . [ c s . CR ] F e b howing the values and frequencies of each metric for eachyear.Our analysis shows that the software vulnerability landscapehas been dominated by only a few vulnerability types andhas changed very little from 2005 to 2019. For example, theoverwhelming majority of vulnerabilities are exploitable overthe network (i.e., remotely). The complexity to successfullyexploit these vulnerabilities is dominantly low while attackersare generally not required to have any level of prior access totheir targets (i.e., having successfully authenticated) in order tolaunch an attack. And most of the flaws require very limitedinteraction with users. On the positive side, the damage ofthese vulnerabilities is mostly confined within the securityscope of the impacted components. Few vulnerabilities obtaingreater privileges than is available to the exploited vulnerablecomponent.Our findings are consistent to previous studies [8] (mainlybased on CVSS version 2). This indicates that the samevulnerabilities are still being found in our software, suggestingthat the community has not been doing a great job correctingthe most common vulnerabilities.The remainder of this paper is organized as follows. SectionII presents the CVSS data sets that constitute the basis of ourstudy. Section III gives the details of our analysis and ourdiscussion. Section IV provides a summary of related workand Section V concludes.II. T HE CVSS D
ATASETS
CVSS consists of three metric groups: base, temporal,and environmental. The base group represents the intrinsicqualities of a vulnerability that are constant over time andacross user environments, the temporal group reflects thecharacteristics of a vulnerability that change over time, andthe environmental group represents the characteristics of avulnerability that are unique to a user’s environment [6]. Inthis work, we evaluate only the base metrics as no extensivedatabase of temporal scores exists and the environment met-rics are designed for an organization to customize base andtemporal scores to their particular environment.Tables I and II show the base score metrics and possiblevalues for v2 and v3, respectively. A particular assignmentof metric values is then used as input to the CVSS basescore equations to generate scores representing the inherentseverity of a vulnerability in general apart from any particularenvironment. The raw score in the range from 0 to 10 is thenoften translated into a ‘qualitative severity rating scale’ (None:0.0, Low: 0.1 to 3.9, Medium: 4.0 to 6.9, High: 7.0 to 8.9,and Critical: 9.0 to 10.0) [6].Vulnerability analysts apply the metrics to vulnerabilities togenerate CVSS vector strings. The vectors describe the metricvalues, but not the CVSS scores, for a particular vulnerabilityusing a simplified notation.The NVD is the ‘U.S. government repository of standardsbased vulnerability management data’ [1]. It provides CVSSvectors and base scores for all vulnerabilities listed in the
TABLE ICVSS V METRICS
CVSS v2 Metrics Metric ValuesAccess Vector (AV) Network (N), Adjacent (A), Local (L)Attack Complexity (AC) Low (L), Medium (M), High (H)Authentication (Au) Multiple (M), Single (S), None (N)Confidentiality (C) Complete (C), Partial (P), None (N)Integrity (I) Complete (C), Partial (P), None (N)Availability (A) Complete (C), Partial (P), None (N)TABLE IICVSS V METRICS
CVSS v3 Metrics Metric ValuesAttack Vector (AV) Network (N), Adjacent (A),Local (L), Physical (P)Attack Complexity (AC) Low (L), High (H)Privileges Required (PR) None (N), Low (L), High (H)User Interaction (UI) None (N), Required (R)Scope (S) Unchanged (U), Changed (C)Confidentiality (C) High (H), Low (L), None (N)Integrity (I) High (H), Low (L), None (N)Availability (A) High (H), Low (L), None (N)
Common Vulnerabilities and Exposures (CVE) [14] [15] cat-alog of publicly disclosed software flaws. We use NVD toevaluate both CVSS v2 and v3 vectors and scores. The v2data covers all CVE vulnerabilities published between 2005and 2019. The v3 data ranges from 2015 to 2019 (only limitedv3 data is available prior to 2015). These coverage dates resultin the inclusion in our study of 118 173 v2 vectors and scoresand 55 441 v3 vectors and scores.III. D
ATA A NALYSIS
We analyze the NVD CVSS data in order to better under-stand the software vulnerability landscape. We investigate boththe current nature of the threat posed by the existence andpublic disclosure of these vulnerabilities as well as how thisthreat has changed over time. To achieve this, we conductthe three studies described previously where we analyze thefollowing: score distributions, metric value distributions, andrelative rankings of the most frequent metric values.
A. Score Distributions
The top graph of Figure 4 shows the theoretical distributionof the v3 scores (V2 scores are similar and not shown inthe paper due to space limitation. They can be found in theappendix of [16]). These plots show what is expected if allCVSS vectors (i.e., vulnerability types) are equally likely tooccur. Note how the theoretical distribution was designed,by the FIRST CVSS committee, to spread CVSS scoresthroughout the range with a somewhat normal distributionwith the most probable scores occurring in the middle ofthe distribution (a little biased to the right). That said, it isinteresting in that for both v2 and v3 some scores are notpossible even though they lie within the valid range of scorevalues.However, the empirical distribution is shown in the bottomof Figure 4 for v3. The empirical data indicates a predomi- ig. 1. Theoretical vs Empirical Score Distributions for CVSS version 3 nance of certain vectors (groupings of vulnerability character-istics) in the real world. Thus, only a few vulnerability featuresets describe the majority of publicly disclosed vulnerabilities.This leads to the frequent use of just a very small number ofscores. A similar observation was made in a previous study ofthe v2 scoring system [8].The results observed with v3, which uses data from 2015to 2019 (since v3 vectors are not generally available prior to2015) are similar to those with v2, which uses data from 2005to 2019. Hence, the long-term obtained with CVSS v2 data isconfirmed by the shorter-term data of CVSS v3.
B. Metric Value Distributions
To investigate more carefully (in order to identify) possibledifferences per year and trends over time, we focus on thedistributions of each set of metric values per year over thetime period of study. Figure 2 provides the histograms for v3from 2015 to 2019. We have also plotted the histograms forv2 [16], which cover from 2005 t0 2019. The inclusion of v2in the study allows for a comparison over 15 years as opposedto being limited to just 5 years with v3, due to its more recentdevelopment.The histograms for individual metric values for v3 appearalmost the same year to year for the 5 years of study. This isthe same in v2 over the longer time period of 15 years withsome small exceptions: in 2014 the attack vector (AV) value ofadjacent had some significance , the attack complexity (AC)value of medium increased some from 2007 onwards but thenwas steady, the authentication (Au) value of single increasedslightly over the years, and the confidentiality (C), integrity (I), According to the NVD team (in an email received March 10, 2020), thiswas a one time anomaly due to more than 800 CVEs all being announcedsimultaneously by an organization doing analyses on phone apps. and availability (A) metric proportions between None, Partial,and Complete varied slightly from year to year while generallymaintaining themselves about the same.Overall though, the software vulnerability landscape forpublicly disclosed vulnerabilities has been almost static duringthe period of study. This said, doing comparisons between thev2 and v3 histograms we see some differences, but this is dueto differences in the approaches of the two versions of CVSS.These differences are primarily seen in regards to the metricsC, I, and A, which we will discuss shortly.Consider the AV metric which reflects the context by whichthe vulnerability can possibly be exploited: Network (N),Adjacent (A), Local (L), or Physical (P). Both data sets showa high peak at N, a low peak at L and almost nothingat A and P. This indicates that the overwhelming majorityof publicly disclosed software vulnerabilities are exploitableover the network (i.e., remotely), and it has been that wayconsistently through the period of study.The AC metric describes the conditions beyond the at-tacker’s control that must exist in order to exploit the vul-nerability. When it is low (AC:L), the attacker can expectrepeatable easy successes, while when it is high (AC:H) theattack is less likely to be successful. The data shows that theAC metric is largely dominated by the values of AC:L forv3 and AC:L and AC medium (AC:M) for v2. This indicatesthat the set of publicly disclosed vulnerabilities have beenpredominantly easy to exploit.This “easiness” to exploit vulnerabilities is confirmed bythe other metrics for each CVSS version. For v3, the Privi-leges Required (PR) metric describes the level of privilegesan attacker must possess before successfully exploiting avulnerability. The user interaction (UI) metric captures therequirements for a human user (other than the attacker) toparticipate in the successful compromising of the vulnerablecomponents. The data shows that in most of the cases, noprivilege is required and very little user interaction is neededfor a successful attack.Similarly, with v2, the Au metric measures the number oftimes an attacker must legitimately authenticate to a target inorder to be in a position to exploit a vulnerability. The datashows that almost always, there is no authentication requiredprior to exploiting a vulnerability. Sometimes a single authenti-cation is required, but almost never is there a vulnerability thatrequires multiple authentications in order to be successfullyexploited.CVSS v3 introduced a new scope (S) metric, which capturesthe spill-over effect: how much a vulnerability in one vulner-able component impacts resources in components outside ofits security scope. When the scope is unchanged (S:U), thereis no spill-over, while when the scope is changed (S:C) thevulnerability will very likely affect other components. The datashows that the scope metric has predominantly been S:U.The last three metrics C, I, and A are common to bothCVSS versions. They capture the extent to which a successfulexploitation of a vulnerability will affect these three principlesof security on the effected component. With respect to these ig. 2. CVSS v3 metrics’ values distributions over the years metrics, the v3 data shows that the impact on C, I, and Ahas predominantly been high (C:H, I:H, and A:H) with verysimilar distributions for all the years. The v2 data also showsa similar stationary behavior in the distributions. However, thedifference in the fraction of high for v3 and complete for v2is notable. One might expect the high values in CVSS v3 tobe equivalent to the complete values for v2. However, this isnot the case as they are defined differently.
C. Relative Rankings of the Most Frequent Metric Values
We now focus on the most prominent individual values ofthe metrics, evaluating the rankings of the top 10 metric valuesobserved each year and providing a comparison between theyears. Figure 3 shows the rankings for v3 (the same plotsfor v2 can be found here [16]). The y-axes show the top 10most prevalent metric values, ordered from the least frequentto most frequent. Thus, the set of metric values included inthe y-axis is significant (only the top ten are shown). The x-axes show the years. Each ( x , y ) point indicates that on year x the metric value at y has a rank indicated by the number inthe circle. The size of the circle is proportional to the numberof times that metric value appeared in a score in that year.For example in Figure 3, in 2017 the metric value AV-N wasthe fourth most frequent metric value within the set of all v3vectors. However, in 2018 and 2019 this metric value becamethe third most frequent. Notice that in general, a value mightappear in the top 10 of one year while not appearing in anotheryear. Whenever that happens, we rank that particular value 11for all the years in which it did not appear.For v3 (see Figure 3), we observed that the same top 10values appeared from 2016 to 2019. Furthermore, only oneof those values is missing in the 2015 top 10. In addition,these values were ranked almost the same over the years. Thetop 2 are constant and in the same order over the time period2015 to 2019. The top 4 and the bottom 4 (including the 11thappended value) are also constant with minor changes in the order they appear over the years. The v2 data shows similarresults (see [16]). This is another illustration of the stationarythreat landscape observed earlier. It also corroborates the ob-servations in Figure 4, that the landscape has been dominatedby just a few vulnerability types.In conclusion, our data indicates that the vulnerability threatlandscape has been dominated by a few vulnerability types andhas not evolved over the years. The overwhelming majorityof software vulnerabilities are exploitable over the network(i.e., remotely). The complexity to successfully exploit thesevulnerabilities is dominantly low and very little authenticationto the target victim is necessary for a successful attack.Moreover, most of the flaws require very limited interactionwith users. The damage of these vulnerabilities has howevermostly been confined within the scope of the compromisedsystems. IV. R ELATED W ORK
There are many efforts to understanding the software vul-nerability landscape. These efforts include reports by securitysolutions vendors [17], [18], white papers from non-profitssuch as MITRE [19] and SANS [20], as well as academicpapers . For CVSS, most studies focused on the aggregationequation that produces the CVSS numerical scores represent-ing the severity of the vulnerability. Surprisingly, we found nostudies on v3 despite its preponderance in commercial securitysoftware.Reference [8] is among the first statistical studies of theCVSS scoring system. It evaluates v1 and proposed im-provements that contributed to the release of v2. Our studyconsiders both v2 and v3 (but doesn’t try to improve on either).Relative to the statistical evaluation, we consider our paper as acontinuation and update of the work in [8]. However, our work Any mention of commercial products or entities is for information only;it does not imply recommendation or endorsement.ig. 3. CVSS v3 top 10 rankings uses data from a much longer time period. It also goes one stepfurther by analyzing association rules of vulnerability metrics.It is worth noting that there are similarities between theresults of the two studies. For instance, both papers show thepredominance of certain types of vulnerabilities. Our temporalanalysis (which was not performed in [8]) shows that thispredominance is maintained over the years.Reference [11] considers CVSS v1 and v2 and analyzes howeffectively v2 addresses the deficiencies found in v1. It alsoidentifies new deficiencies. In contrast, our motivation was tounderstand the threat landscape.Reference [13] uses empirical data from an internationalcyber defense exercise to study how 18 security estimationmetrics based on CVSS correlate with the actual time-to-compromised (TTC) of 34 successful attacks. This study usesTTC as a dependent variable to analyze how well differentsecurity estimation models involving CVSS are able to ap-proximate the actual security of network systems. The resultssuggest that security modeling with CVSS data alone doesnot accurately portray the time-to-compromise of a system.This result questions the applicability of the CVSS numericalscoring equation. Our study focused on the raw CVSS vec-tors, which represent the actual experts’ opinions about the vulnerabilities.Reference [21] uses NVD data to study trends and patternsin software vulnerabilities in order to predict the time to nextvulnerability for a given software application. Data miningtechniques were used as prediction tools. The vulnerabilityfeatures used to aid the prediction are the published timeof each vulnerability and its version. We believe that thesefeatures are not sufficiently informative. Instead, we directlyuse the eight metrics from the CVSS base scores whichconstitute the best available information covering large multi-year sets of vulnerabilities.Reference [22] also carried out a predictive study based onthe NVD/CVSS and ExploitDB [23] data. Using the CVSSdata, it attempts to answer two questions: (1) Can we predictthe time until a proof of concept exploit is developed basedon the CVSS metrics? and (2) Are CVSS metrics populated intime to be used meaningfully for exploit delay prediction ofCVEs?
The former is answered in the positive, while the latteris answered in the negative. While using the same datasets,our objective differs from that in [22]. We did not attempt topredict the threat landscape; we provide a thorough historicaland statistical study of vulnerabilities for the last fifteen years.The work in [24] is another assessment of CVSS. Itvaluates the trustworthiness of CVSS by considering datafound in five vulnerability databases: NVD, X-Force, OSVDB(Open Source Vulnerability Database), CERT-VN (ComputerEmergency Response Team, Vulnerability Notes Database) ,and Cisco IntelliShield Alerts. It then uses a Bayesian model tostudy consistencies and differences. It concluded that CVSS istrustworthy and robust in the sense that most of the databasesgenerally agree. This suggests that our focus on the NVD tostudy the threat landscape is justified: studies using data fromthe other databases will likely lead to the same conclusions.All of the studies cited above are focused on v1 and v2. Inour literature survey, we did not find a single study that usesthe updated and significantly modified v3 to understand thesoftware vulnerability landscape. We believe that the presentpaper is the first of this kind in doing so. Furthermore, ourstudy is the first to use association rule mining and co-occurrence of vulnerability metrics’ values in an attempt tocharacterize the software threat landscape.V. C
ONCLUSION
Our data indicates that the vulnerability threat landscapefor publicly disclosed vulnerabilities has been dominated by afew vulnerability types and has not significantly changed from2005 to 2019. However, the underlying software flaw typesthat enable these vulnerabilities change dramatically from yearto year (for example, see [25]). This means that many flawtypes result in vulnerabilities with the same properties. This isbad news because it means, as a security community, it will bedifficult to eliminate certain vulnerability types because theyresult from a plethora of underlying software flaw types.Another concern is that the overwhelming majority of soft-ware vulnerabilities are exploitable over the network. Whendeveloping software, efforts should be made to reduce unnec-essary connections, protect necessary ones, and require moreauthentication where possible to reduce attack surface area.Another significant issue is that most of the vulnerabilitiesrequire no sophistication to be exploited (but again this ishard to improve upon due to the many software flaw typesthat allow this).These two factors together combined with the finding thatmost vulnerabilities require very limited interaction with usersfacilitates the widespread hacking occurring today. Often insecurity literature the human is cited as the weakest link. Whilecertainly humans can be exploited, within the set of CVE typevulnerabilities exploitation of humans plays a very minor role;training humans will have little impact in this area.Overall, this study documents the security community’sinability to eliminate any types of vulnerabilities throughaddressing the related software flaw types. In 15 years, thevulnerability landscape hasn’t changed; through the lens ofthe metrics in this paper we aren’t making progress. Perhapswe as community need to “stop and think” about the wayswe are developing software and/or the methods we use toidentify vulnerabilities. The security community needs newapproaches. We don’t want to write this same paper 15 yearsfrom now showing that, once again, nothing has changed. Overall, this study shows that either we (the community) areincapable of correcting the most common software flaws, orwe are focusing on the wrong flaws. In either case, it seems tous that there is a need to “stop and think” about the ways weare developing software and/or the methods we use to identifyvulnerabilities. R
National Infrastructure AdvisoryCouncil, Vulnerability Disclosure Working Group, Vulnerability ScoringSubgroup , 2004.[5] P. Mell, K. Scarfone, and S. Romanosky, “A complete guide to thecommon vulnerability scoring system version 2.0,” in
Published byFIRST-Forum of Incident Response and Security Teams
IET Information Security , vol. 1, no. 3, pp. 119–127, 2007.[9] P. Mell, K. Scarfone, and S. Romanosky, “Common vulnerability scoringsystem,”
IEEE Security & Privacy , vol. 4, no. 6, pp. 85–89, 2006.[10] H. Holm and K. K. Afridi, “An expert-based investigation of the commonvulnerability scoring system,”
Computers & Security , vol. 53, pp. 18–30,2015.[11] K. Scarfone and P. Mell, “An analysis of cvss version 2 vulnerabilityscoring,” in
Proceedings of the 2009 3rd International Symposium onEmpirical Software Engineering and Measurement . IEEE ComputerSociety, 2009, pp. 516–525.[12] R. Wang, L. Gao, Q. Sun, and D. Sun, “An improved cvss-based vul-nerability scoring mechanism,” in . IEEE, 2011, pp.352–355.[13] H. Holm, M. Ekstedt, and D. Andersson, “Empirical analysis of system-level vulnerability metrics through actual attacks,”
IEEE Transactions ondependable and secure computing , vol. 9, no. 6, pp. 825–837, 2012.[14] D. W. Baker, S. M. Christey, W. H. Hill, and D. E. Mann, “The devel-opment of a common enumeration of vulnerabilities and exposures,” in
Recent Advances in Intrusion Detection , vol. 7, 1999, p. 9.[15] “Common vulnerabilities and exposures,” 2020, accessed: 2020-2-5.[Online]. Available: https://cve.mitre.org[16] A. Gueye and P. Mell, “A Historical and Statistical Study of the SoftwareVulnerability Landscape,” arXiv e-prints
International Conference on Database and Expert Systems Applications .Springer, 2011, pp. 217–231.22] Y. Y. A. Feutrill, D. Ranathunga and M. Roughan, “The effect ofcommon vulnerability scoring system metrics on vulnerability exploitdelay,” in
IEEETransactions on Dependable and Secure Computing , 2016.[25] “National vulnerability database, cwe over time,” 2019, accessed: 2019-12-10. [Online]. Available: https://nvd.nist.gov/general/visualizations/vulnerability-visualizations/cwe-over-time A PPENDIX