[PDF] Vulnerability Coverage as an Adequacy Testing Criterion

Abstract

Mainstream software applications and tools are the configurable platforms with an enormous number of parameters along with their values. Certain settings and possible interactions between these parameters may harden (or soften) the security and robustness of these applications against some known vulnerabilities. However, the large number of vulnerabilities reported and associated with these tools make the exhaustive testing of these tools infeasible against these vulnerabilities infeasible. As an instance of general software testing problem, the research question to address is whether the system under test is robust and secure against these vulnerabilities. This paper introduces the idea of ``vulnerability coverage,'' a concept to adequately test a given application for a certain classes of vulnerabilities, as reported by the National Vulnerability Database (NVD). The deriving idea is to utilize the Common Vulnerability Scoring System (CVSS) as a means to measure the fitness of test inputs generated by evolutionary algorithms and then through pattern matching identify vulnerabilities that match the generated vulnerability vectors and then test the system under test for those identified vulnerabilities. We report the performance of two evolutionary algorithms (i.e., Genetic Algorithms and Particle Swarm Optimization) in generating the vulnerability pattern vectors.

Full PDF

VVulnerability Coverage as an Adequacy Testing Criterion

SHUVALAXMI DASS,

Texas Tech University, USA

AKBAR SIAMI NAMIN,

Texas Tech University, USA

Mainstream software applications and tools are the configurable platforms with an enormous number of parameters alongwith their values. Certain settings and possible interactions between these parameters may harden (or soften) the security androbustness of these applications against some known vulnerabilities. However, the large number of vulnerabilities reportedand associated with these tools make the exhaustive testing of these tools infeasible against these vulnerabilities infeasible. Asan instance of general software testing problem, the research question to address is whether the system under test is robustand secure against these vulnerabilities. This paper introduces the idea of “ vulnerability coverage ,” a concept to adequatelytest a given application for a certain classes of vulnerabilities, as reported by the National Vulnerability Database (NVD). Thederiving idea is to utilize the Common Vulnerability Scoring System (CVSS) as a means to measure the fitness of test inputsgenerated by evolutionary algorithms and then through pattern matching identify vulnerabilities that match the generatedvulnerability vectors and then test the system under test for those identified vulnerabilities. We report the performance of twoevolutionary algorithms (i.e., Genetic Algorithms and Particle Swarm Optimization) in generating the vulnerability patternvectors.CCS Concepts: •

Security and privacy → Software security engineering ; •

Software and its engineering → Softwareconfiguration management and version control systems;Additional Key Words and Phrases: Software Vulnerability Testing, Vulnerability Coverage, Genetic Algorithms (GA), ParticleSwarm Optimization (PSO)

ACM Reference Format:

Shuvalaxmi Dass and Akbar Siami Namin. 2020. Vulnerability Coverage as an Adequacy Testing Criterion. 1, 1 (June 2020),7 pages. https://doi.org/10.1145/3341105.3374099

Software systems and applications are often released with a great number of features and settings. These featuresand configurations serve their users and the underlying platforms for different purposes such as architecturalsettings, virtualization, performance, security and access control, privacy, and system level interactions. Forinstance, MySQL Version 5.5 lists more than 600 configuration parameters categorized into 3 groups namelyServer Options, System Variables, and Status Variable References [2]. While these parameters offer great featuresto their administrators for setting up software systems properly, an improper configuration and setting of suchparameters also create loopholes in the systems and thus are vulnerable to certain known or even unknownsecurity attacks (i.e., zero-day vulnerability [4]).According to the National Vulnerability Database (NVD) [3], as of September 2019, there are 1 ,

644 recordsof reported vulnerabilities with assigned CVE numbers. Some of these vulnerabilities are directly the cause ofimproper settings of the configurations parameters offered as features by the software systems. From the software

Authors’ addresses: Shuvalaxmi Dass, Texas Tech University, P.O. Box 43104, Lubbock, Texas, USA, 79409-3104, [email protected]; AkbarSiami Namin, Texas Tech University, P.O. Box 43104, Lubbock, Texas, USA, 79409-3104, [email protected] to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided thatcopies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).© 2020 Copyright held by the owner/author(s).XXXX-XXXX/2020/6-ARThttps://doi.org/10.1145/3341105.3374099 , Vol. 1, No. 1, Article . Publication date: June 2020. a r X i v : . [ c s . S E ] J un • Shuvalaxmi Dass and Akbar Siami Namin testing perspective, enumerating all configuration settings and then verifying whether the given software isvulnerable to certain attacks is infeasible.This paper introduces the concept of “ vulnerability coverage ” as an adequacy criterion for choosing instancesof vulnerabilities that the software under test needs to be checked against. The deriving idea is to utilize theCommon Vulnerability Scoring System (CVSS) as a means to measure the vulnerability level of the softwareunder test. For instance, a vulnerability with CVE-2019-16383 reported for MySQL has the severity rated as 8 . Vulnerability Vector/Pattern , CVSS vector , configurations , interchangeably. This paperis structured as follows: Section 2 reviews the related works. The concept of Common Vulnerability ScoringSystem (CVSS) is presented in Section 3. The methodology of adapting the evolutionary algorithms is presentedin Section 4. The evaluation of the proposed idea performed on a case study is reported in Section 5. Section 6concludes the paper. The work presented in this paper offers some solutions for generating a set of thorough secure configurations fora given system when implementing Moving Target Defense (MTD) strategies [13].Crouse and Fulp [8] used conventional Genetic Algorithms to implement a Moving Target Defense (MTD) envi-ronment to enable security through temporal and spatial diversity in computer configuration parameters.Crouseand Fulp reported that the pool of configurations becomes stale when there are no changes introduced to the setof configurations over a period of time. As a result, GA deals with a limited set of configurations.In this approach,the fitness (security) of aging configurations is reduced by a value (i.e., decay value) based on the time since theywere last active. Such weak configurations are eventually replaced by more secure ones.Lucas et al. [10] described a framework for implementing MTD at host-level. Their framework uses evolutionary-inspired Genetic Algorithm (GA) to generate secure configurations. They evaluated their framework using twoqualitative measurements: Fitness score and pairwise Hamming distance (i.e., diversity).The use of genetic algorithms in generating a thorough set of test inputs has been discussed and modeled inliterature. For instance, Andrews et al. [5] used genetic algorithms to enable random testing more effective. Asimilar approach is adapted here to produce a better test inputs for the purpose of maximizing the coverage levelof test pool using the evolutionary algorithms.

The Common Vulnerability Scoring System (CVSS) provides a way to capture the principal characteristics of avulnerability and produce a numerical score reflecting its severity. The scoring system also provides a textualrepresentation of the semantic of the calculated score. The numerical score can then be translated into a qualitativerepresentation (e.g., low, medium, high, and critical) to help organizations to properly assess and prioritize theirvulnerability management processes [1]. , Vol. 1, No. 1, Article . Publication date: June 2020. ulnerability Coverage as an Adequacy Testing Criterion • 3

CVSS is composed of three main metric groups: (1) Base, (2) Temporal, and (3) Environmental, each consistingof a set of sub-metrics. Without loss of generality and to demonstrate the feasibility of the proposed approach,the GA and PSO algorithms are only applied to the Base metric group. Additional reason that this paper focuseson the Base metric is due to the fact that the Base metric quantifies the essential characteristics of a vulnerability,which remains unchanged across different environments and over time. The Base metric consists of two sub-mainmetrics: (1) Exploitability Metrics.

It describes the “how” part of the attack that is being captured, which depends onthe characteristics of the vulnerable components. This metric consists of:–

Attack Vector (AV).

It reflects the proximity of the attacker to attack the vulnerable component. The morethe proximity required to attack the component, the harder it is for the attacker. The attack vector takes onfour values: Network (N), Adjacent (A), Local (L) and Physical (P).–

Attack Complexity (AC).

This metric reflects the resources and conditions that are required to conduct theexploit on the vulnerable component. The more the number of conditions to be met, the higher the degreeof complexity of attack is. It takes on two values: Low (L), and High (H).–

Privileges Required (PR).

This metric represents the level of privileges required by an attacker to successfullylaunch an exploit. The lesser the level is, the easier the attack is. It takes on three values: None (N), Low (L),and High (H).–

User Interaction (UI).

It reflects whether the participation of the user is required for launching a successfulattack. The attack becomes difficult if the user interaction is mandatory. This metric takes on two values:None (N) and Required (R). (2) Impact Metrics.

These metrics reflect the characteristics of the impacted components. They consist of:–

Availability Impact.

It measures the severity of the attack on the availability of the impacted component.The metric takes on these values: None (N), Low (L), and High (H).–

Integrity Impact.

It measures the severity of the attack on the integrity of the impacted component. Themetric takes on three values: None (N), Low (L), and High (H).–

Confidentiality Impact.

It measures the severity of the attack on the confidentiality of the impacted compo-nent. It takes on the following values: None (N), Low (L), and High (H). (3) Scope Metrics.

There is also a vector called “scope” which describes the scope of the attack (i.e., whetherthe attack on the vulnerable component consequently impacted the resources beyond its means). It takes on twovalues: Unchanged (U) and Changed (C).CVSS incorporates all of the aforementioned metrics in a formula to calculate the vulnerability score. The lowerthe Base score is, the harder it is for the execution of an exploit on the vulnerable component. Furthermore, thescore is measured using a certain number of features. For instance, Figure 1 shows the CVSS score along with thevulnerability vector for CVE-2019-10665 reported for MySQL. As the figure shows, the severity of this vulnerabilityis rated as 9 . [AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H] where [1] :– AV:N indicates that the Attack Vector (AV) of such vulnerability is set at the Network (N) level.–

AC:L implies that the Attack Complexity (AC) of this vulnerability is Low (L).–

PR:N shows that the Privileges Required (PR) for launching an attack based on this vulnerability is None(N).–

UI:N reports that the User Interaction (UI) and involvement in enabling launching a successful attack isNone (n).–

S:U shows the Scope (S) of the attack is Unchanged (U).–

C:H indicates that the risk of losing the Confidentiality (C) of data when this attack occurs is High (H).–

I:H implies that the risk of losing the Integrity (I) of data when this attack occurs is High (H). , Vol. 1, No. 1, Article . Publication date: June 2020. • Shuvalaxmi Dass and Akbar Siami Namin

Fig. 1. The description of CVE-2019-10665 for MySQL. – A:H indicates that the risk of losing the Availability (C) of data when this attack occurs is also High (H).

We compared two different evolutionary algorithms for the vulnerability pattern generation targeting a certainlevel of CVSS score, as fitness function. The optimization algorithms that we implemented were: 1) Genetic andevolutionary Algorithms (GA), and 2) Particle Swarm Optimization (PSO).We chose the CVSS score as a measure of fitness function because the diversity of its parameters enable us toadapt typical optimization and search techniques such as evolutionary algorithms for addressing this problem.Moreover, some other researchers have also adapted these types of greedy algorithms to address similar problemsin test input generations in the context of random testing (e.g., [5]). Each parameter in the configuration isassigned a score vector that represents the vulnerability it contains along with its severity. The score is modeledafter the Common Vulnerability Scoring System (CVSS) vector and provides a method for measuring the securitylevel of an individual configuration parameter setting. The vector can serve as a foundation to estimate thenumber of possible vulnerabilities of a certain configuration. Moreover, to account for the diversity in theconfiguration generation, we also calculated the Hamming distance, which measures how different one pattern(i.e., configuration) is from another.

We developed a python-based genetic algorithm script to generate a CVSS vector string pool with the best fitnessscore (i.e., = . string ” since each vector is treated as a string in our implementation. We set thenumber of iterations =

50 and population size = Configuration Generation : An initial pool of 100 possible CVSS vector strings was generated randomly.(2)

Fitness Score : The Python library, called

CVSS3 , implements the Base metric score it was utilized to calculatethe CVSS scores. The vector strings were assigned their CVSS scores to be their fitness scores if the formerlied between 2 . .

5, otherwise, they were assigned 100.(3)

Breeder’s Selection : This type of selection not only chooses the best solutions (i.e., vectors with lower score)of the previous generation but also selects some random ones to avoid converging soon to a local minima. , Vol. 1, No. 1, Article . Publication date: June 2020. ulnerability Coverage as an Adequacy Testing Criterion • 5 (4)

Crossover : For crossover, we took the simplest way of randomly switching the values of correspondingmetrics among the two parent vector strings.(5)

Mutation : We performed mutation on the CVSS vector configurations by randomly picking one vector fieldand changing its value by randomly selecting from permissible values.

We also implemented the PSO algorithm based on the CVSS scores for the purpose of comparing the performanceof PSO with GA on generating a set of secure configurations. In order to enable the comparison unbiased, wekept the best score, number of iterations and the size of population similar to the ones set for GA.The PSO implementation is similar to GA. However, unlike GA, PSO is easier to implement as it has noevolution operators such as crossover and mutation. In PSO, the potential solutions, are called particles. Thealgorithm mainly deals with two parameters: (1) pbest_fitness and (2) particle_vel , which contain the initial pbest fitness and velocity values for every particle in the swarm, respectively. The velocity for each particlemeasures how far is its fitness score ( pbest ) from the best score.The PSO algorithm returns a count of particles with scores = . = . particle_velocity (i.e., Hamming Distance) between the range [ , ] as 0 and8 are the minimum and maximum number of differences that might exist between two particles, respectively.The fitness_range is set in the range of [ , ] where 2 . . Particle Initialization follows a similar strategy as described for the configuration generation in GA. Thealgorithm continues to find the best fitness value and velocity for every particle until it reaches a threshold limit.In each iteration, the algorithm assigns the pbest_fitness (particle best) values as their CVSS scores cvss_fit only when the fitness is better (i.e., in our case, lesser the better) than its current pbest fitness value. It thenassigns the gbest (global best) value of the swarm to be the best pbest value obtained so far by any particle inthe population. After finding its two best fitness values, each particle updates itself in a similar fashion as GAwhere the configurations are mutated whenever their current velocity values are greater than their previousones. For instance, if ‘AV’ vector field gets chosen randomly, then any one value will be randomly picked fromthe {H, L, N, A} set.

We evaluated the performance of the two evolutionary algorithms through a case study. Table 1 reports thepercentages of number of CVSS vulnerability vectors produced for different CVSS score ranges across 100 runsfor both GA and PSO scripts. We observed that the values came out to be almost similar. [2.0] (2.0, 3.0] (2.0, 4.0] (2.0, 5.0]GA

PSO

Table 1. The percentage instances of CVSS vectors produced by GA and PSO for the different score ranges in 100 runs.

A possible explanation of obtaining similar results for both GA and PSO is that the number of vector fieldsfor CVSS (i.e., Base level) is limited to eight. As a result, there are not too many options for GA or PSO to selectfrom. Therefore, both algorithms converge to the same values quickly because the search space is very small.Considering all the other metrics in CVSS might provide a larger search space for the algorithms.To illustrate the effectiveness of the introduced adequacy criterion for security testing, we looked up theCVE website [7] and identified different vulnerabilities whose CVSS patterns matched the vulnerability patternproduced by GA and PSO. As an example, based on the CVSS pattern:

AV:L/AC:L/PR:N/S:N/C:P/I:N/A:N (In , Vol. 1, No. 1, Article . Publication date: June 2020. • Shuvalaxmi Dass and Akbar Siami Namin

Mysql CVE-2019-14939 An issue was discovered in the mysql (aka mysqljs) module 2.17.1 for Node.js. TheLOAD DATA LOCAL INFILE option is open by default 2.1 Mysql CVE-2016-7440 The C software implementation of AES Encryption and Decryption in wol f SSL (formerly CyaSSL) before 3.9.10 makes it easier for local users to discover AESkeys by leveraging cache-bank timing differences 2.1 Oracle CVE-2014-6551 Unspecified vulnerability in Oracle MySQL Server 5.5.38 and earlier and 5.6.19and earlier allows local users to affect confidentiality via vectors related toCLIENT:MYSQLADMIN 2.1 Oracle CVE-2012-3160 Unspecified vulnerability in the MySQL Server component in Oracle MySQL 5.1.65and earlier, and 5.5.27 and earlier, allows local users to affect confidentiality viaunknown vectors related to Server Installation 2.1 Mysql CVE-2006-4031 MySQL 4.1 before 4.1.21 and 5.0 before 5.0.24 allows a local user to access a tablethrough a previously created MERGE table, even after the user’s privileges arerevoked for the original table, which might violate intended security policy 2.1

Table 2. The number of vulnerabilities found for CVSS pattern:

AV:L/AC:L/PR:N/S:N/C:P/I:N/A:N . C: P , P stands for Partial or Low as per the website), table 2 shows multiple CVEs of vulnerabilities we foundin the product MySQL for different vendors along with their description and CVSS score with the exact CVSSvector pattern matching.

This paper introduced the concept of “ vulnerability coverage ” as an adequacy criterion for security andvulnerability testing of software applications. The deriving idea is to utilize Common Vulnerability Scoring System(CVSS) as a fitness metric and identify a set of vulnerability vector patterns that achieves a certain level of CVSSscore. The generated set can be used for adequacy testing of underlying system in which all or representative setsof vulnerabilities with similar vulnerability vector patterns will be selected for further inspection of the systemunder test. The paper compared two evolutionary-based algorithms namely Genetic Algorithms and ParticipleSwarm Optimization and the results indicated a similar results obtained by these two greedy algorithms.The novel idea of vulnerability adequacy criterion as introduced in this paper needs further attentions. To ourbest knowledge, an adequacy criterion based on vulnerability coverage does not exist. There are several otherfeatures that need to be investigated including other metrics incorporated into CVSS and National VulnerabilityDatabase (NVD) including temporal and environmental metrics. Furthermore, the idea needs tool supports andfurther empirical studies to thoroughly search the NVD database for reported vulnerabilities with exact patternmatching property for security testing purposes and investigate the effectiveness of such adequacy criterion.The usefulness of Bayesian approaches have been discussed extensively in the literature [11]. These probabilisticreasoning approaches can be adapted in the context of uncertainty analysis for implementing adaptive securitytesting in dynamic domains (e.g., reinforcement reasoning [6]). It is possible to apply these learning-basedalgorithms along with temporal properties and dependencies and then adapt deep learning-based approaches[12] to address the problem. In the presence of existence of some constraints in the configuration settings, theproblem can be formulated as a constraint satisfaction problem and the generation of test inputs using symbolicexecutions [9].

ACKNOWLEDGMENT

This research work is supported in part by a funding from National Science Foundation under grant numbers1516636 and 1821560. , Vol. 1, No. 1, Article . Publication date: June 2020. ulnerability Coverage as an Adequacy Testing Criterion • 7

REFERENCES

IEEE BigData .[5] James H. Andrews, Felix Chun Hang Li, and Tim Menzies. 2007. Nighthawk: a two-level genetic-random unit test data generator. In

IEEE/ACM International Conference on Automated Software Engineering . 144–153.[6] Moitrayee Chatterjee and Akbar Siami Namin. 2019. Detecting Phishing Websites through Deep Reinforcement Learning. In

AnnualComputer Software and Applications Conference, COMPSAC

Symposium onConfiguration Analytics and Automation . 1–7.[9] Marcel Heimlich and Akbar Siami Namin. 2019. TestLocal: just-in-time parametrized testing of local variables. In

Proceedings of the 34thACM/SIGAPP Symposium on Applied Computing, SAC . 1874–1877.[10] Brian Lucas, Errin W. Fulp, David J. John, and Daniel Cañas. 2014. An Initial Framework for Evolving Computer Configurations As aMoving Target Defense. In

The Annual Cyber and Information Security Research Conference . 69–72.[11] Akbar Siami Namin and Mohan Sridharan. 2010. Bayesian reasoning for software testing. In the Workshop on Future of SoftwareEngineering Research . 349–354.[12] Sima Siami-Namini, Neda Tavakoli, and Akbar Siami Namin. 2019. The Performance of LSTM and BiLSTM in Forecasting Time Series.In

IEEE BigData .[13] Jianjun Zheng and Akbar Siami Namin. 2019. A Survey on the Moving Target Defense Strategies: An Architectural Perspective.