[PDF] A Testability Analysis Framework for Non-Functional Properties

Abstract

This paper presents background, the basic steps and an example for a testability analysis framework for non-functional properties.

Full PDF

AA Testability Analysis Framework forNon-Functional Properties

Michael FeldererBogdan Marculescu

Blekinge Institute of Technology

Karlskrona, [email protected]

Francisco Gomes de Oliveira NetoRobert FeldtRichard Torkar

Chalmers and the University of Gothenburg

Gothenburg, Sweden

Abstract —This paper presents background, the basic stepsand an example for a testability analysis framework for non-functional properties.

Index Terms —testability, extra-functional properties, non-functional properties, robustness, software testing

I. I

NTRODUCTION

Testability is a quality attribute that evaluates the effective-ness and efﬁciency of testing: If the testability of a softwareartifact is high, then ﬁnding faults by means of testing is easier.A lower degree of testability results in increased test effort, andthus in less testing performed for a ﬁxed amount of time [1].While software testability has been extensivelyinvestigated—in a recent systematic literature reviewthe authors identiﬁed 208 papers [2]—the focus has alwaysbeen on functional testing, while non-functional propertiesare often neglected [3]. Little is known regarding testabilityof non-functional properties. Thus, there is ample opportunityto investigate the relationship between software testability anddifferent non-functional properties. In this paper we contributeto this unexplored ﬁeld by characterising and exemplifying atestability analysis framework for non-functional properties.The aim of such an analysis framework is to predict andallocate test resources, assist in the testability design, comparetesting approaches or, more generally, to support decisionmaking in projects. The framework is developed based on anin-depth analysis of available testability deﬁnitions, testabilityframeworks and work testability of non-functional properties.II. B

ACKGROUND AND R ELATED A PPROACHES

In this section, we present background on testability def-initions, related testability measurement frameworks as wellas related work on software testability and non-functionalproperties. From each part of the section, we draw someconclusions (shown in boxes at the end of each subsection) toguide the development of testability measurement frameworksfor non-functional properties.

A. Testability Deﬁnitions

Software testability is now established to be a distinctsoftware quality characteristics [4]. However, testability hasalways been an elusive, context-sensitive concept and itscorrect measurement is a difﬁcult exercise [5]. Therefore, the notion of software testability has been subject to a numberof different interpretations by standards and experts. In theirsystematic review on software testability, Garousi et al. [2]provide, overall, 33 deﬁnitions for testability extracted fromdifferent papers and standards.A comprehensive testability deﬁnition is provided in theISO/IEC Standard 25010 on system and software qualitymodels. It deﬁnes testability as the degree of effectiveness andefﬁciency with which test criteria can be established for asystem, product or component and tests can be performed todetermine whether those criteria have been met . The deﬁnitionrefers to the effectiveness and efﬁciency aspects of testabilityand makes explicit that testability is context-dependent withrespect to the applied test criteria and the relevant artifactsunder test.Some testability deﬁnitions explicitly cover the efﬁciencyaspect, e.g., when deﬁning testability as the effort required totest software [6], or the effectiveness aspect, e.g., measure ofhow easily software exposes faults when tested [7].Other testability deﬁnitions deﬁne it explicitly via the coretestability factors of observability and controllability, e.g.,when deﬁning (domain) testability as ease of modifying aprogram so that it is observable and controllable [8] .Finally, there are also testability deﬁnitions that provide amore holistic view and also take human and process aspectsof testability into account. This is for instance the case in thetestability deﬁnitions how easy it is to test by a particulartester and test process, in a given context [9] and property ofboth the software and the process and refers to the easinessfor applying all the [testing] steps and on the inherent of thesoftware to reveal faults during testing [10].TD1 Testability is relative to the test criteria and artifactsunder test.TD2 Testability is determined by effectiveness and efﬁ-ciency measures for testing.TD3 Testability has product, process and human aspects. a r X i v : . [ c s . S E ] F e b . Available Testability Measurement Frameworks Most available work on testability provides speciﬁc tech-niques or methods [2]. But also models, metrics and frame-works are available. In this section, we summarise threerelevant and representative empirical frameworks for testabilitybased on the collection provided in [2] that support testabilitymeasurement.Binder [11] provides a testability framework for object-oriented systems. In [11] the author claims that testability isa result of six high-level factors: (1) Characteristics of therepresentation, (2) characteristics of the implementation, (3)built-in test capabilities, (4) the test suite, (5) the test sup-port environment, and (6) the software development process.Each factor is further reﬁned to sub-characteristics, for whichoccasionally also metrics and relationships are deﬁned. Forinstance, structure is one sub-characteristic of implementationwith assigned complexity metrics like number of methods perclass.Mouchawrab et al. [5] provide a well-founded measurementframework for object-oriented software testability. The mainaim of the framework is to improve testability during softwaredesign based on UML diagrams. For each testing phase, i.e.,unit, integration, system, and regression testing, attributes thatpotentially have an impact on software testability in that phase,are provided. For each testability attribute, a list of measurablesub-attributes is deﬁned. For instance, for unit testing thetestability attribute unit size with the metrics local featuresand inherited features (measured for class diagrams) is deﬁned.The framework is complemented by a theory and its associatedhypotheses. For instance, one hypothesis states that increasingthe number of local features to be tested increases the cost ofunit testing as more test cases are likely to be required andoracles may increase in complexity if they need to account foradditional attributes .Bach [9] deﬁnes ﬁve ‘practical’ testability types, i.e., epis-temic testability (“How narrow is the gap between whatwe know and what we need to know about the status ofthe product under test”), value-related testability (“Testabilityinﬂuenced by changing the quality standard or our knowl-edge of it”), project-related testability (“Testability inﬂuencedby changing the conditions under which we test”), intrinsictestability (“Testability inﬂuenced by changing the productitself”), and subjective testability (“Testability inﬂuenced bychanging the tester or the test process”). For each testabilitytype, characteristics are deﬁned, e.g., domain knowledge ortesting skills for subjective testability, and observability andcontrollability for intrinsic testability. Furthermore, relation-ships like improving test strategy might decrease subjectivetestability or vice versa are deﬁned.None of the available testability frameworks examines testa-bility, and its relationship to other non-functional properties,in any details. However, we can draw some conclusions toguide the development of testability analysis frameworks: TF1 Testability frameworks deﬁne testability characteris-tics and respective metrics for a speciﬁc testabilitycontext.TF2 Testability frameworks deﬁne statements to put thetestability context, characteristics and metrics in re-lations to each other.

C. Software Testability and Non-Functional Properties

As highlighted before, software testability and their rela-tionship to non-functional properties is a relatively unexploredﬁeld. However, recently two literature reviews on softwaretestability and its relationship to the non-functional propertiesrobustness [3] and performance [12] were published.The literature review on software testability and robustnessincludes overall 27 primary studies. The most frequentlyaddressed testability issues investigated in the context ofrobustness are observability, controllability, automation, andtesting effort. The most frequently addressed robustness issuesare fault tolerance, handling external inﬂuence, and exceptionhandling. Metrics that consider testability and robustness to-gether are rare. In general authors report a positive relationshipbetween software testability and software robustness [3].The literature review on software testability and perfor-mance includes overall 26 primary studies. The most fre-quently addressed testability issues investigated in the contextof performance are observability, controllability, automation,and testing effort. Note that the most frequently addressedtestability issues in the context of robustness and performanceare identical. The most frequently addressed performanceissues are timeliness, response time, and memory usage. Againmetrics that consider testability and performance together arerare. Furthermore, Gonz´alez et al. [13] present a measurementframework for runtime testability of component-based systemsthat is related to testability and performance. As runtimetesting is, different from traditional testing, performed on theﬁnal execution environment it interferes with the system stateor resource availability. The framework therefore identiﬁesthe test sensitivity characteristics: component state, componentinteraction, resource limitations and availability, which deter-mine whether testing interferes with the state of the runningsystem or its environment in an unacceptable way as wellas the test isolation techniques state separation, interactionseparation, resource monitoring and scheduling that providecountermeasures for its test sensitivity.TN1 The most frequently addressed testability issues forrobustness and performance are observability, con-trollability, automation, and testing effort.TN2 The most frequently addressed robustness issuesare fault tolerance, handling external inﬂuence, andexception handling.TN3 The most frequently addressed performance issuesare timeliness, response time, and memory usage.II. T

OWARDS A M EASUREMENT F RAMEWORK FOR N ON -F UNCTIONAL T ESTABILITY

In this section, we ﬁrst present goals of a measurementframework for non-functional properties and then sketch ourTestability Causation Analysis Framework taking ﬁndingsfrom the previous section into account.

A. Overview and Goals

Our goal is to develop a measurement framework for non-functional properties based on the ﬁndings of the previoussection. As testability is a relative concept (see TD1) andhas different aspects (see TD3), it is not possible to developa single measurement system that covers all non-functionalproperties, aspects and contexts. We need a general frameworkthat can be adapted to these points of variation and beinstantiated to provide guidance to conceptualise, analyse andmeasure testability in speciﬁc situations.Available frameworks that have been successfully appliedand evaluated for functional testability analysis often takea layered approach and add detail for a speciﬁc testabilitycontext, set of characteristics and related variables (see TF1).Based on our analysis of existing frameworks above, weadapt and extend the object-oriented testability framework(OOTF) of Mouchawrab et al. [5] to address testability ofnon-functional properties. Their framework is practical andcan be used both for approximate, qualitative assessment oftestability (‘Would testability increase or decrease, given acertain change?’), and as a basis for more exact, quantitativeassessment (‘How much will testability increase or decreasegiven a change of this size in this variable?’). A basicassumption it makes is also that the cost to test to a certainlevel of quality is a natural and hands-on way to conceptualisetestability. We thus reuse some aspects of the framework whileadapting, extending and generalising it so that it can be appliednot only during the analysis and design stages of object-oriented software but for analysis of non-functional propertieson any type of software system.The reusable elements include the different levels andthe decomposition of testability into characteristics, sub-characteristics and attributes. That allows the OOTF frame-work to be adapted towards speciﬁc conditions of testability[5]. However, it is not obvious that non-functional (NF)properties can be captured in this way.For example, the OOTF framework distinguishes the differ-ent levels of testing (unit, integration, system and regression),thus aggregating attributes from lower levels into the higherlevels. That distinction between levels of testing is harder tomake, or even not needed, when dealing with NF properties,since NF testing not always apply or differ at all levelsof testing. Moreover, the OOTF framework does not clearlyinclude factors that account for other aspects of testability suchas process, company/environment, or the considered testingtechniques.To summarise, our contributions relative to the existingframework, are four-fold: i ) To generalise from OO softwareto any type of software system, ii ) to focus on non-functional testability rather than functional, iii ) to clarify that the sameframework can be used both qualitatively and quantitatively,and iv ) to consider more types of factors of the situationthan only design-related factors of the SUT. In the following,we further detail our proposed framework called TestabilityCausation Analysis Framework. B. TCAF: Testability Causation Analysis Framework

TCAF is mostly to be used qualitatively but we see a naturalextension to also quantitative use. Our adaptation focuseson analysing testability in terms of the inputs that mediateor directly affect it (e.g., SUT, the test technique(s) beingused, human and organisational facets, etc.) and their effectson testability outputs (primarily the cost and effectiveness oftesting ).For testability outputs, we argue that NF properties aretypically not atomic and need to be broken down into sub-characteristics or issues. This allows a more detailed analysis.For instance, if we choose Robustness as the NF property, thereare the sub-characteristics identiﬁed in the literature review ofHassan et al. [3], i.e., exception handling, fault tolerance, andhandling of external inﬂuences. Depending on the speciﬁc NFproperty and level of detail one wants, these might need to befurther sub-divided into characteristics. Once this division hasbeen made we have identiﬁed a set of NF attributes. For eachattribute we then identify speciﬁc testability outputs.An underlying aspect of testability is to measure the time/effort/cost needed to perform a certain type of testing [6],[5], which we will refer simply as TestCost . Therefore, allNF sub-characteristics should be connected to a cost variable.Conversely, in order to capture effectiveness, i.e. the (quality)level to which the testing of the NF (sub-)characteristic hasbeen achieved, we need attribute-speciﬁc variables that willoften vary depending on a variety of factors. We refer to thoseattribute-speciﬁc variables as the extent of testability, or simply

TestabilityExtent .Note that, in some contexts, the extent can be a binaryvariable where stakeholders do (or do not) have the necessaryinstruments and dependencies to test the NF attribute, i.e., itis not necessarily continuous. Other scenarios can be a degreeof the extent to which testability can be measured (similarto coverage variables). For instance, a situation where a testtechnique can only be partially applied would mean a reducedextent of the measured testability. A typical example wouldbe when there is a ﬁxed time or cost for conducting a certaintype of testing.In brief, our framework thus decomposes testability intoseveral levels, beginning with the non-functional property ofinterest and then further into, potentially, several levels of sub-characteristics to arrive at the NF attributes we consider. Eachsuch NF attribute is then connected to testability output vari-ables (TOVs), i.e.,

TestCost and

TestabilityExtent ,that capture aspects of the testability factors in terms of cost Note that we explicitly exclude efﬁciency here, since it can be deﬁnedas effectiveness divided by cost and is thus indirectly being analysed via itssub-components. nd extent of testing. The main idea of TCAF is then toconsider which input factors that would cause a change inthese NF TOVs. These input factors are captured in testabilityinput variables (TIVs) that are typically of at least three types:those that capture i ) the surrounding environment, namely thecontext (e.g., team conﬁguration, processes used, experiencewith the used test techniques), ii ) the system under test(e.g., system complexity, number of test interfaces, numberof arguments and types of those interfaces), and iii ) thetest techniques considered (e.g., test optimisation and testgeneration). Given input factors and the output factors they(potentially) affect how one can then proceed to qualitativelyanalyse the direction and strength of this causation, or to modelit statistically and thus being able to predict those effects.The following steps further detail how to use the TCAFframework:1) Identify Testability Output Variables (TOVs) speciﬁcto the non-functional property considered and its dif-ferent sub-characteristics. These variables will alwaysinclude the TestCost variable, but can also have

TestabilityExtent variables.

Outputs:

Layereddecomposition of the NF property into NF attributes andtestability output variables for each of the attributes.2) Identify the set of test techniques to be considered orcompared in terms of testability. If it is already given thata certain technique can only reach a certain degree of

TestabilityExtent for an NF attribute, they neednot be further modelled in subsequent steps. If the testtechniques imply speciﬁc sub-activities in order to beapplied reﬁne the

TestCost variables from Step 1 tobe speciﬁc for each sub-activity (designing, executing,reporting the tests, etc.).3) Identify system and context attributes that have animpact on TOVs, and deﬁne TIVs for them. The testtechniques themselves might also have variation pointsthat lead to additional TIVs to include.4) Analyse the effect that TIVs have on TOVs. This caneither happen Qualitatively or Quantitatively. For thelatter, we need quantiﬁcation of TIV values as well asstatistical modelling of the TOVs based on the TIVs.For the former, one needs experience- or research-basedreasoning of the level or direction of effect.We believe that the TCAF framework can help build acausal model of how different attributes/variables of the TIV(e.g., context/SUT/test techniques) determine different aspectsof testability in terms of TOVs, i.e., very much how it hasbeen done in other disciplines [14]. An added beneﬁt is to beable to quantify those variables and, eventually, statisticallymodel the strength of their effect on testability. Given recentprogress on actually analysing causality, rather than simplycorrelating variables with statistical methods, this would nowbe realistic [15]. This feature would be highly relevant anduseful for estimating/predicting the TOVs related to cost, whileit may be harder to quantify and then predict TOVs measuringthe testability extent. IV. E

XAMPLE A PPLICATION : R

OBUSTNESS T ESTING

This section sketches an example of how TCAF can beapplied for testing robustness. An overview of the relevantTIVs is shown in Figure 1.

RobustnessGödelTest SUT Factors

GeneratorsMutatorsDistances SUT connectivityInput Complexity

Context Factors

Automated testing ExperienceSBST Experience

Fig. 1. Testability causation analysis example, focusing on robustness.

In the following, we explain each of the four steps toinstantiate TCAF.

Step 1 : Testability output variables . For this example wewill focus on robustness . For the sake of brevity, we onlyconsider two exception handling sub-characteristics of robust-ness, the system’s ability to handle atypical and invalid inputs.The TOVs are the cost for and extent to which we can test thetwo NF attributes:

CostAtypical , ExtentAtypical , CostInvalid , ExtentInvalid . Step 2 : Test technique . We consider a single test techniquein this example: G¨odelTest, a search-based, automated test datageneration technique. G¨odelTest [16] has been shown to beuseful for robustness testing [17]. It deﬁnes a method of devel-oping valid test inputs of any complexity using a generator , toexplore invalid test data using one or more mutation operators ,all driven by a distance metric to assess how “far away”the generated inputs are from the typical test cases. Each ofthe three components (generator, mutation operators, distancemetric) needs to be in place for the technique to work, so the

TestCost associated with each will be assessed separately.When applying this test technique to a large software undertest (SUT) we can further consider all these factors for eachand every of the interfaces of the SUT that we want to test forrobustness, but for the sake of this example we only considerone interface.

Step 3 : System and context variables . An example ofa context attribute that would have an impact on the costof adopting the technique is that of the relative experiencethat the company and its testers and developers have withautomated testing tools in general, and with search-basedsoftware testing (SBST) tools and G¨odelTest, in particular. Themore experienced the testers and developers are, and the moreexperienced the company is in developing and using automatedtesting tools, the lower the costs are likely to be. In addition,the complexity of the SUT is also likely to be an importantfactor. For example, cost is likely to increase with the numberand relative complexity of input data types. For example, it isclear that it is much easier to deﬁne a generator for arrays ofintegers than for graphs represented in XML ﬁles.

Step 4 : Causal effects . The effects can be analyzed depend-ing on the amount of information available, and this analysisan be updated in time. An initial evaluation would most likelybe qualitative, focusing on whether each of the TIVs has aneffect, and if that effect is likely to be positive or negative. Acompany may conclude that it does not have many testers ordevelopers with SBST experience, and that is likely to havea negative impact on the cost of adopting G¨odelTest. Or itmight decide that applying robustness testing on all interfacesis not called for and the testing needs to be more focused. Asmore information becomes available, the analysis can be morereﬁned, ﬁrst as a qualitative analysis focusing on discrete steps.For example, when looking at the components of G¨odelTest,the company may conclude that it has a number of testingtools that allow the generation of inputs for their SUTs. Thus, generators are available for a relatively low cost. On the otherextreme, mutation operators would likely be custom, incurringsigniﬁcant cost to develop and validate; in particular, if theinput data types are complex and company- or system-speciﬁc.While the analysis for

CostAtypical and

CostInvalid should be quite similar there is a differencein the number and type of mutation operators needed;the mutation operators for generating atypical inputs aremuch less complex since we are using the generator as is(atypical inputs are still valid and thus should be capturedin the way the generator is deﬁned). Similarly, there aremany more invalid data than valid, and thus atypical, so

ExtentInvalid will have to be much more constrainedand will directly affect

CostInvalid . This indicates thatmore complex analysis or statistical modeling might beneeded. It is not always the case that testability outputs canbe predicted only from the inputs; outputs might sometimesinﬂuence each other.When possible, the analysis would move more towardquantitative assessments and to include more attributes andfactors. For robustness, we could consider other robustnessaspects from the literature [18] as well. A company withexperience in working with SBST systems, for example, maybe able to estimate the cost of implementing G¨odelTest quiteaccurately, as well as have a clearer understanding of the effectthe implementation would have on its products. However,regardless of the level of detail used, TCAF can help structurethe testability analysis and make it concrete.V. C

ONCLUSION

In this paper we present a testability causation analysisframework for non-functional properties. The framework isdeveloped based on available frameworks and review studieson testability, and prototypically applied to robustness testing.The framework is used in four steps. First, testabilityoutput variables including test cost and testability extend areidentiﬁed. Second, the set of test techniques to be consideredis identiﬁed. Third, system and context attributes are identiﬁedas testability input variables. Fourth, the effect that testabilityinput variables have on testability output variables are anal-ysed.So far the framework has not been evaluated. In future,we therefore plan to reﬁne and evaluate the testability cau- sation analysis framework for different non-functional prop-erties including robustness, performance, security and energyconsumption (as well as their inter-dependence) in differentcontexts. A

CKNOWLEDGMENT

The paper was partly funded by the Knowledge Foundation(KKS) of Sweden through the project 20130085: Testing ofCritical System Characteristics (TOCSYC).R

EFERENCES[1] J. M. Voas and K. W. Miller, “Software testability: The new veriﬁcation,”

IEEE software , vol. 12, no. 3, pp. 17–28, 1995.[2] V. Garousi, M. Felderer, and F. Nur Kilicaslan, “What we know aboutsoftware testability: A survey,”

ArXiv e-prints , Jan. 2018.[3] M. M. Hassan, W. Afzal, M. Blom, B. Lindstr¨om, S. F. Andler, andS. Eldh, “Testability and software robustness: A systematic literaturereview,” in

Proceedings of the 41st Euromicro Conference on SoftwareEngineering and Advanced Applications (SEAA) . IEEE, 2015, pp. 341–348.[4] ISO/IEC, “ISO/IEC 25010:2011 systems and software engineering –Systems and software quality requirements and evaluation (square) –System and software quality models,” 2011.[5] S. Mouchawrab, L. C. Briand, and Y. Labiche, “A measurement frame-work for object-oriented software testability,”

Information and softwaretechnology , vol. 47, no. 15, pp. 979–997, 2005.[6] ISO/IEC/IEEE, “ISO/IEC/IEEE 24765:2010 Systems and software en-gineering – Vocabulary,” 2010.[7] T. Yu, W. Wen, X. Han, and J. H. Hayes, “Predicting testability ofconcurrent programs,” in

Proceedings of the 10th IEEE InternationalConference on Software Testing, Veriﬁcation and Validation (ICST) .IEEE, 2016, pp. 168–179.[8] R. Poston, J. Patel, and J. S. Dhaliwal, “A software testing assessmentto manage project testability,” in

Proceedings of the 20th EuropeanConference on Information Systems (ECIS)

Proceedings of Technology of Object-Oriented Languages and Systems .IEEE, 1999, pp. 96–107.[11] R. V. Binder, “Design for testability in object-oriented systems,”

Com-munications of the ACM , vol. 37, no. 9, pp. 87–101, 1994.[12] M. M. Hassan, W. Afzal, B. Lindstr¨om, S. M. A. Shah, S. F. Andler, andM. Blom, “Testability and software performance: A systematic mappingstudy,” in

Proceedings of the 31st Annual ACM Symposium on AppliedComputing . ACM, 2016, pp. 1566–1569.[13] A. Gonz´alez, E. Piel, and H.-G. Gross, “A model for the measurement ofthe runtime testability of component-based systems,” in

Software Testing,Veriﬁcation and Validation Workshops (ICSTW) . IEEE, 2009, pp. 19–28.[14] G. Imbens and D. Rubin,

Causal Inference in Statistics, Social, andBiomedical Sciences , ser. Causal Inference for Statistics, Social, andBiomedical Sciences: An Introduction. Cambridge University Press,2015.[15] J. Peters, D. Janzing, and B. Sch¨olkopf,

Elements of Causal Inference:Foundations and Learning Algorithms , ser. Adaptive Computation andMachine Learning Series. Cambridge, MA, USA: The MIT Press,2017.[16] R. Feldt and S. Poulding, “Finding test data with speciﬁc properties viametaheuristic search,” in , Nov 2013, pp. 350–359.[17] S. Poulding and R. Feldt, “Generating controllably invalid and atypicalinputs for robustness testing,” in ,March 2017, pp. 81–84.[18] A. Shahrokni and R. Feldt, “A systematic review of software robustness,”