A Testability Analysis Framework for Non-Functional Properties
Michael Felderer, Bogdan Marculescu, Francisco Gomes de Oliveira Neto, Robert Feldt, Richard Torkar
AA Testability Analysis Framework forNon-Functional Properties
Michael FeldererBogdan Marculescu
Blekinge Institute of Technology
Karlskrona, [email protected]
Francisco Gomes de Oliveira NetoRobert FeldtRichard Torkar
Chalmers and the University of Gothenburg
Gothenburg, Sweden
Abstract —This paper presents background, the basic stepsand an example for a testability analysis framework for non-functional properties.
Index Terms —testability, extra-functional properties, non-functional properties, robustness, software testing
I. I
NTRODUCTION
Testability is a quality attribute that evaluates the effective-ness and efficiency of testing: If the testability of a softwareartifact is high, then finding faults by means of testing is easier.A lower degree of testability results in increased test effort, andthus in less testing performed for a fixed amount of time [1].While software testability has been extensivelyinvestigated—in a recent systematic literature reviewthe authors identified 208 papers [2]—the focus has alwaysbeen on functional testing, while non-functional propertiesare often neglected [3]. Little is known regarding testabilityof non-functional properties. Thus, there is ample opportunityto investigate the relationship between software testability anddifferent non-functional properties. In this paper we contributeto this unexplored field by characterising and exemplifying atestability analysis framework for non-functional properties.The aim of such an analysis framework is to predict andallocate test resources, assist in the testability design, comparetesting approaches or, more generally, to support decisionmaking in projects. The framework is developed based on anin-depth analysis of available testability definitions, testabilityframeworks and work testability of non-functional properties.II. B
ACKGROUND AND R ELATED A PPROACHES
In this section, we present background on testability def-initions, related testability measurement frameworks as wellas related work on software testability and non-functionalproperties. From each part of the section, we draw someconclusions (shown in boxes at the end of each subsection) toguide the development of testability measurement frameworksfor non-functional properties.
A. Testability Definitions
Software testability is now established to be a distinctsoftware quality characteristics [4]. However, testability hasalways been an elusive, context-sensitive concept and itscorrect measurement is a difficult exercise [5]. Therefore, the notion of software testability has been subject to a numberof different interpretations by standards and experts. In theirsystematic review on software testability, Garousi et al. [2]provide, overall, 33 definitions for testability extracted fromdifferent papers and standards.A comprehensive testability definition is provided in theISO/IEC Standard 25010 on system and software qualitymodels. It defines testability as the degree of effectiveness andefficiency with which test criteria can be established for asystem, product or component and tests can be performed todetermine whether those criteria have been met . The definitionrefers to the effectiveness and efficiency aspects of testabilityand makes explicit that testability is context-dependent withrespect to the applied test criteria and the relevant artifactsunder test.Some testability definitions explicitly cover the efficiencyaspect, e.g., when defining testability as the effort required totest software [6], or the effectiveness aspect, e.g., measure ofhow easily software exposes faults when tested [7].Other testability definitions define it explicitly via the coretestability factors of observability and controllability, e.g.,when defining (domain) testability as ease of modifying aprogram so that it is observable and controllable [8] .Finally, there are also testability definitions that provide amore holistic view and also take human and process aspectsof testability into account. This is for instance the case in thetestability definitions how easy it is to test by a particulartester and test process, in a given context [9] and property ofboth the software and the process and refers to the easinessfor applying all the [testing] steps and on the inherent of thesoftware to reveal faults during testing [10].TD1 Testability is relative to the test criteria and artifactsunder test.TD2 Testability is determined by effectiveness and effi-ciency measures for testing.TD3 Testability has product, process and human aspects. a r X i v : . [ c s . S E ] F e b . Available Testability Measurement Frameworks Most available work on testability provides specific tech-niques or methods [2]. But also models, metrics and frame-works are available. In this section, we summarise threerelevant and representative empirical frameworks for testabilitybased on the collection provided in [2] that support testabilitymeasurement.Binder [11] provides a testability framework for object-oriented systems. In [11] the author claims that testability isa result of six high-level factors: (1) Characteristics of therepresentation, (2) characteristics of the implementation, (3)built-in test capabilities, (4) the test suite, (5) the test sup-port environment, and (6) the software development process.Each factor is further refined to sub-characteristics, for whichoccasionally also metrics and relationships are defined. Forinstance, structure is one sub-characteristic of implementationwith assigned complexity metrics like number of methods perclass.Mouchawrab et al. [5] provide a well-founded measurementframework for object-oriented software testability. The mainaim of the framework is to improve testability during softwaredesign based on UML diagrams. For each testing phase, i.e.,unit, integration, system, and regression testing, attributes thatpotentially have an impact on software testability in that phase,are provided. For each testability attribute, a list of measurablesub-attributes is defined. For instance, for unit testing thetestability attribute unit size with the metrics local featuresand inherited features (measured for class diagrams) is defined.The framework is complemented by a theory and its associatedhypotheses. For instance, one hypothesis states that increasingthe number of local features to be tested increases the cost ofunit testing as more test cases are likely to be required andoracles may increase in complexity if they need to account foradditional attributes .Bach [9] defines five ‘practical’ testability types, i.e., epis-temic testability (“How narrow is the gap between whatwe know and what we need to know about the status ofthe product under test”), value-related testability (“Testabilityinfluenced by changing the quality standard or our knowl-edge of it”), project-related testability (“Testability influencedby changing the conditions under which we test”), intrinsictestability (“Testability influenced by changing the productitself”), and subjective testability (“Testability influenced bychanging the tester or the test process”). For each testabilitytype, characteristics are defined, e.g., domain knowledge ortesting skills for subjective testability, and observability andcontrollability for intrinsic testability. Furthermore, relation-ships like improving test strategy might decrease subjectivetestability or vice versa are defined.None of the available testability frameworks examines testa-bility, and its relationship to other non-functional properties,in any details. However, we can draw some conclusions toguide the development of testability analysis frameworks: TF1 Testability frameworks define testability characteris-tics and respective metrics for a specific testabilitycontext.TF2 Testability frameworks define statements to put thetestability context, characteristics and metrics in re-lations to each other.
C. Software Testability and Non-Functional Properties
As highlighted before, software testability and their rela-tionship to non-functional properties is a relatively unexploredfield. However, recently two literature reviews on softwaretestability and its relationship to the non-functional propertiesrobustness [3] and performance [12] were published.The literature review on software testability and robustnessincludes overall 27 primary studies. The most frequentlyaddressed testability issues investigated in the context ofrobustness are observability, controllability, automation, andtesting effort. The most frequently addressed robustness issuesare fault tolerance, handling external influence, and exceptionhandling. Metrics that consider testability and robustness to-gether are rare. In general authors report a positive relationshipbetween software testability and software robustness [3].The literature review on software testability and perfor-mance includes overall 26 primary studies. The most fre-quently addressed testability issues investigated in the contextof performance are observability, controllability, automation,and testing effort. Note that the most frequently addressedtestability issues in the context of robustness and performanceare identical. The most frequently addressed performanceissues are timeliness, response time, and memory usage. Againmetrics that consider testability and performance together arerare. Furthermore, Gonz´alez et al. [13] present a measurementframework for runtime testability of component-based systemsthat is related to testability and performance. As runtimetesting is, different from traditional testing, performed on thefinal execution environment it interferes with the system stateor resource availability. The framework therefore identifiesthe test sensitivity characteristics: component state, componentinteraction, resource limitations and availability, which deter-mine whether testing interferes with the state of the runningsystem or its environment in an unacceptable way as wellas the test isolation techniques state separation, interactionseparation, resource monitoring and scheduling that providecountermeasures for its test sensitivity.TN1 The most frequently addressed testability issues forrobustness and performance are observability, con-trollability, automation, and testing effort.TN2 The most frequently addressed robustness issuesare fault tolerance, handling external influence, andexception handling.TN3 The most frequently addressed performance issuesare timeliness, response time, and memory usage.II. T
OWARDS A M EASUREMENT F RAMEWORK FOR N ON -F UNCTIONAL T ESTABILITY
In this section, we first present goals of a measurementframework for non-functional properties and then sketch ourTestability Causation Analysis Framework taking findingsfrom the previous section into account.
A. Overview and Goals
Our goal is to develop a measurement framework for non-functional properties based on the findings of the previoussection. As testability is a relative concept (see TD1) andhas different aspects (see TD3), it is not possible to developa single measurement system that covers all non-functionalproperties, aspects and contexts. We need a general frameworkthat can be adapted to these points of variation and beinstantiated to provide guidance to conceptualise, analyse andmeasure testability in specific situations.Available frameworks that have been successfully appliedand evaluated for functional testability analysis often takea layered approach and add detail for a specific testabilitycontext, set of characteristics and related variables (see TF1).Based on our analysis of existing frameworks above, weadapt and extend the object-oriented testability framework(OOTF) of Mouchawrab et al. [5] to address testability ofnon-functional properties. Their framework is practical andcan be used both for approximate, qualitative assessment oftestability (‘Would testability increase or decrease, given acertain change?’), and as a basis for more exact, quantitativeassessment (‘How much will testability increase or decreasegiven a change of this size in this variable?’). A basicassumption it makes is also that the cost to test to a certainlevel of quality is a natural and hands-on way to conceptualisetestability. We thus reuse some aspects of the framework whileadapting, extending and generalising it so that it can be appliednot only during the analysis and design stages of object-oriented software but for analysis of non-functional propertieson any type of software system.The reusable elements include the different levels andthe decomposition of testability into characteristics, sub-characteristics and attributes. That allows the OOTF frame-work to be adapted towards specific conditions of testability[5]. However, it is not obvious that non-functional (NF)properties can be captured in this way.For example, the OOTF framework distinguishes the differ-ent levels of testing (unit, integration, system and regression),thus aggregating attributes from lower levels into the higherlevels. That distinction between levels of testing is harder tomake, or even not needed, when dealing with NF properties,since NF testing not always apply or differ at all levelsof testing. Moreover, the OOTF framework does not clearlyinclude factors that account for other aspects of testability suchas process, company/environment, or the considered testingtechniques.To summarise, our contributions relative to the existingframework, are four-fold: i ) To generalise from OO softwareto any type of software system, ii ) to focus on non-functional testability rather than functional, iii ) to clarify that the sameframework can be used both qualitatively and quantitatively,and iv ) to consider more types of factors of the situationthan only design-related factors of the SUT. In the following,we further detail our proposed framework called TestabilityCausation Analysis Framework. B. TCAF: Testability Causation Analysis Framework
TCAF is mostly to be used qualitatively but we see a naturalextension to also quantitative use. Our adaptation focuseson analysing testability in terms of the inputs that mediateor directly affect it (e.g., SUT, the test technique(s) beingused, human and organisational facets, etc.) and their effectson testability outputs (primarily the cost and effectiveness oftesting ).For testability outputs, we argue that NF properties aretypically not atomic and need to be broken down into sub-characteristics or issues. This allows a more detailed analysis.For instance, if we choose Robustness as the NF property, thereare the sub-characteristics identified in the literature review ofHassan et al. [3], i.e., exception handling, fault tolerance, andhandling of external influences. Depending on the specific NFproperty and level of detail one wants, these might need to befurther sub-divided into characteristics. Once this division hasbeen made we have identified a set of NF attributes. For eachattribute we then identify specific testability outputs.An underlying aspect of testability is to measure the time/effort/cost needed to perform a certain type of testing [6],[5], which we will refer simply as TestCost . Therefore, allNF sub-characteristics should be connected to a cost variable.Conversely, in order to capture effectiveness, i.e. the (quality)level to which the testing of the NF (sub-)characteristic hasbeen achieved, we need attribute-specific variables that willoften vary depending on a variety of factors. We refer to thoseattribute-specific variables as the extent of testability, or simply
TestabilityExtent .Note that, in some contexts, the extent can be a binaryvariable where stakeholders do (or do not) have the necessaryinstruments and dependencies to test the NF attribute, i.e., itis not necessarily continuous. Other scenarios can be a degreeof the extent to which testability can be measured (similarto coverage variables). For instance, a situation where a testtechnique can only be partially applied would mean a reducedextent of the measured testability. A typical example wouldbe when there is a fixed time or cost for conducting a certaintype of testing.In brief, our framework thus decomposes testability intoseveral levels, beginning with the non-functional property ofinterest and then further into, potentially, several levels of sub-characteristics to arrive at the NF attributes we consider. Eachsuch NF attribute is then connected to testability output vari-ables (TOVs), i.e.,
TestCost and
TestabilityExtent ,that capture aspects of the testability factors in terms of cost Note that we explicitly exclude efficiency here, since it can be definedas effectiveness divided by cost and is thus indirectly being analysed via itssub-components. nd extent of testing. The main idea of TCAF is then toconsider which input factors that would cause a change inthese NF TOVs. These input factors are captured in testabilityinput variables (TIVs) that are typically of at least three types:those that capture i ) the surrounding environment, namely thecontext (e.g., team configuration, processes used, experiencewith the used test techniques), ii ) the system under test(e.g., system complexity, number of test interfaces, numberof arguments and types of those interfaces), and iii ) thetest techniques considered (e.g., test optimisation and testgeneration). Given input factors and the output factors they(potentially) affect how one can then proceed to qualitativelyanalyse the direction and strength of this causation, or to modelit statistically and thus being able to predict those effects.The following steps further detail how to use the TCAFframework:1) Identify Testability Output Variables (TOVs) specificto the non-functional property considered and its dif-ferent sub-characteristics. These variables will alwaysinclude the TestCost variable, but can also have
TestabilityExtent variables.
Outputs:
Layereddecomposition of the NF property into NF attributes andtestability output variables for each of the attributes.2) Identify the set of test techniques to be considered orcompared in terms of testability. If it is already given thata certain technique can only reach a certain degree of
TestabilityExtent for an NF attribute, they neednot be further modelled in subsequent steps. If the testtechniques imply specific sub-activities in order to beapplied refine the
TestCost variables from Step 1 tobe specific for each sub-activity (designing, executing,reporting the tests, etc.).3) Identify system and context attributes that have animpact on TOVs, and define TIVs for them. The testtechniques themselves might also have variation pointsthat lead to additional TIVs to include.4) Analyse the effect that TIVs have on TOVs. This caneither happen Qualitatively or Quantitatively. For thelatter, we need quantification of TIV values as well asstatistical modelling of the TOVs based on the TIVs.For the former, one needs experience- or research-basedreasoning of the level or direction of effect.We believe that the TCAF framework can help build acausal model of how different attributes/variables of the TIV(e.g., context/SUT/test techniques) determine different aspectsof testability in terms of TOVs, i.e., very much how it hasbeen done in other disciplines [14]. An added benefit is to beable to quantify those variables and, eventually, statisticallymodel the strength of their effect on testability. Given recentprogress on actually analysing causality, rather than simplycorrelating variables with statistical methods, this would nowbe realistic [15]. This feature would be highly relevant anduseful for estimating/predicting the TOVs related to cost, whileit may be harder to quantify and then predict TOVs measuringthe testability extent. IV. E
XAMPLE A PPLICATION : R
OBUSTNESS T ESTING
This section sketches an example of how TCAF can beapplied for testing robustness. An overview of the relevantTIVs is shown in Figure 1.
RobustnessGödelTest SUT Factors
GeneratorsMutatorsDistances SUT connectivityInput Complexity
Context Factors
Automated testing ExperienceSBST Experience
Fig. 1. Testability causation analysis example, focusing on robustness.
In the following, we explain each of the four steps toinstantiate TCAF.
Step 1 : Testability output variables . For this example wewill focus on robustness . For the sake of brevity, we onlyconsider two exception handling sub-characteristics of robust-ness, the system’s ability to handle atypical and invalid inputs.The TOVs are the cost for and extent to which we can test thetwo NF attributes:
CostAtypical , ExtentAtypical , CostInvalid , ExtentInvalid . Step 2 : Test technique . We consider a single test techniquein this example: G¨odelTest, a search-based, automated test datageneration technique. G¨odelTest [16] has been shown to beuseful for robustness testing [17]. It defines a method of devel-oping valid test inputs of any complexity using a generator , toexplore invalid test data using one or more mutation operators ,all driven by a distance metric to assess how “far away”the generated inputs are from the typical test cases. Each ofthe three components (generator, mutation operators, distancemetric) needs to be in place for the technique to work, so the
TestCost associated with each will be assessed separately.When applying this test technique to a large software undertest (SUT) we can further consider all these factors for eachand every of the interfaces of the SUT that we want to test forrobustness, but for the sake of this example we only considerone interface.
Step 3 : System and context variables . An example ofa context attribute that would have an impact on the costof adopting the technique is that of the relative experiencethat the company and its testers and developers have withautomated testing tools in general, and with search-basedsoftware testing (SBST) tools and G¨odelTest, in particular. Themore experienced the testers and developers are, and the moreexperienced the company is in developing and using automatedtesting tools, the lower the costs are likely to be. In addition,the complexity of the SUT is also likely to be an importantfactor. For example, cost is likely to increase with the numberand relative complexity of input data types. For example, it isclear that it is much easier to define a generator for arrays ofintegers than for graphs represented in XML files.
Step 4 : Causal effects . The effects can be analyzed depend-ing on the amount of information available, and this analysisan be updated in time. An initial evaluation would most likelybe qualitative, focusing on whether each of the TIVs has aneffect, and if that effect is likely to be positive or negative. Acompany may conclude that it does not have many testers ordevelopers with SBST experience, and that is likely to havea negative impact on the cost of adopting G¨odelTest. Or itmight decide that applying robustness testing on all interfacesis not called for and the testing needs to be more focused. Asmore information becomes available, the analysis can be morerefined, first as a qualitative analysis focusing on discrete steps.For example, when looking at the components of G¨odelTest,the company may conclude that it has a number of testingtools that allow the generation of inputs for their SUTs. Thus, generators are available for a relatively low cost. On the otherextreme, mutation operators would likely be custom, incurringsignificant cost to develop and validate; in particular, if theinput data types are complex and company- or system-specific.While the analysis for
CostAtypical and
CostInvalid should be quite similar there is a differencein the number and type of mutation operators needed;the mutation operators for generating atypical inputs aremuch less complex since we are using the generator as is(atypical inputs are still valid and thus should be capturedin the way the generator is defined). Similarly, there aremany more invalid data than valid, and thus atypical, so
ExtentInvalid will have to be much more constrainedand will directly affect
CostInvalid . This indicates thatmore complex analysis or statistical modeling might beneeded. It is not always the case that testability outputs canbe predicted only from the inputs; outputs might sometimesinfluence each other.When possible, the analysis would move more towardquantitative assessments and to include more attributes andfactors. For robustness, we could consider other robustnessaspects from the literature [18] as well. A company withexperience in working with SBST systems, for example, maybe able to estimate the cost of implementing G¨odelTest quiteaccurately, as well as have a clearer understanding of the effectthe implementation would have on its products. However,regardless of the level of detail used, TCAF can help structurethe testability analysis and make it concrete.V. C
ONCLUSION
In this paper we present a testability causation analysisframework for non-functional properties. The framework isdeveloped based on available frameworks and review studieson testability, and prototypically applied to robustness testing.The framework is used in four steps. First, testabilityoutput variables including test cost and testability extend areidentified. Second, the set of test techniques to be consideredis identified. Third, system and context attributes are identifiedas testability input variables. Fourth, the effect that testabilityinput variables have on testability output variables are anal-ysed.So far the framework has not been evaluated. In future,we therefore plan to refine and evaluate the testability cau- sation analysis framework for different non-functional prop-erties including robustness, performance, security and energyconsumption (as well as their inter-dependence) in differentcontexts. A
CKNOWLEDGMENT
The paper was partly funded by the Knowledge Foundation(KKS) of Sweden through the project 20130085: Testing ofCritical System Characteristics (TOCSYC).R
EFERENCES[1] J. M. Voas and K. W. Miller, “Software testability: The new verification,”
IEEE software , vol. 12, no. 3, pp. 17–28, 1995.[2] V. Garousi, M. Felderer, and F. Nur Kilicaslan, “What we know aboutsoftware testability: A survey,”
ArXiv e-prints , Jan. 2018.[3] M. M. Hassan, W. Afzal, M. Blom, B. Lindstr¨om, S. F. Andler, andS. Eldh, “Testability and software robustness: A systematic literaturereview,” in
Proceedings of the 41st Euromicro Conference on SoftwareEngineering and Advanced Applications (SEAA) . IEEE, 2015, pp. 341–348.[4] ISO/IEC, “ISO/IEC 25010:2011 systems and software engineering –Systems and software quality requirements and evaluation (square) –System and software quality models,” 2011.[5] S. Mouchawrab, L. C. Briand, and Y. Labiche, “A measurement frame-work for object-oriented software testability,”
Information and softwaretechnology , vol. 47, no. 15, pp. 979–997, 2005.[6] ISO/IEC/IEEE, “ISO/IEC/IEEE 24765:2010 Systems and software en-gineering – Vocabulary,” 2010.[7] T. Yu, W. Wen, X. Han, and J. H. Hayes, “Predicting testability ofconcurrent programs,” in
Proceedings of the 10th IEEE InternationalConference on Software Testing, Verification and Validation (ICST) .IEEE, 2016, pp. 168–179.[8] R. Poston, J. Patel, and J. S. Dhaliwal, “A software testing assessmentto manage project testability,” in
Proceedings of the 20th EuropeanConference on Information Systems (ECIS)
Proceedings of Technology of Object-Oriented Languages and Systems .IEEE, 1999, pp. 96–107.[11] R. V. Binder, “Design for testability in object-oriented systems,”
Com-munications of the ACM , vol. 37, no. 9, pp. 87–101, 1994.[12] M. M. Hassan, W. Afzal, B. Lindstr¨om, S. M. A. Shah, S. F. Andler, andM. Blom, “Testability and software performance: A systematic mappingstudy,” in
Proceedings of the 31st Annual ACM Symposium on AppliedComputing . ACM, 2016, pp. 1566–1569.[13] A. Gonz´alez, E. Piel, and H.-G. Gross, “A model for the measurement ofthe runtime testability of component-based systems,” in
Software Testing,Verification and Validation Workshops (ICSTW) . IEEE, 2009, pp. 19–28.[14] G. Imbens and D. Rubin,
Causal Inference in Statistics, Social, andBiomedical Sciences , ser. Causal Inference for Statistics, Social, andBiomedical Sciences: An Introduction. Cambridge University Press,2015.[15] J. Peters, D. Janzing, and B. Sch¨olkopf,
Elements of Causal Inference:Foundations and Learning Algorithms , ser. Adaptive Computation andMachine Learning Series. Cambridge, MA, USA: The MIT Press,2017.[16] R. Feldt and S. Poulding, “Finding test data with specific properties viametaheuristic search,” in , Nov 2013, pp. 350–359.[17] S. Poulding and R. Feldt, “Generating controllably invalid and atypicalinputs for robustness testing,” in ,March 2017, pp. 81–84.[18] A. Shahrokni and R. Feldt, “A systematic review of software robustness,”