[PDF] Towards Fault Localization via Probabilistic Software Modeling

Abstract

Software testing helps developers to identify bugs. However, awareness of bugs is only the first step. Finding and correcting the faulty program components is equally hard and essential for high-quality software. Fault localization automatically pinpoints the location of an existing bug in a program. It is a hard problem, and existing methods are not yet precise enough for widespread industrial adoption. We propose fault localization via Probabilistic Software Modeling (PSM). PSM analyzes the structure and behavior of a program and synthesizes a network of Probabilistic Models (PMs). Each PM models a method with its inputs and outputs and is capable of evaluating the likelihood of runtime data. We use this likelihood evaluation to find fault locations and their impact on dependent code elements. Results indicate that PSM is a robust framework for accurate fault localization.

Full PDF

TTowards Fault Localizationvia Probabilistic Software Modeling

Hannes Thaller, Lukas Linsbauer, Alexander Egyed

Institute for Software Systems EngineeringJohannes Kepler University Linz, Austria{hannes.thaller, lukas.linsbauer, alexander.egyed}@jku.at

Stefan Fischer

Software Competence Center Hagenberg GmbHAustriastefan.ﬁ[email protected]

Abstract —Software testing helps developers to identify bugs.However, awareness of bugs is only the ﬁrst step. Finding andcorrecting the faulty program components is equally hard andessential for high-quality software. Fault localization automati-cally pinpoints the location of an existing bug in a program. It isa hard problem, and existing methods are not yet precise enoughfor widespread industrial adoption. We propose fault localizationvia Probabilistic Software Modeling (PSM). PSM analyzes thestructure and behavior of a program and synthesizes a networkof Probabilistic Models (PMs). Each PM models a method withits inputs and outputs and is capable of evaluating the likelihoodof runtime data. We use this likelihood evaluation to ﬁnd faultlocations and their impact on dependent code elements. Resultsindicate that PSM is a robust framework for accurate faultlocalization.

Index Terms —fault localization, probabilistic modeling, multi-variate testing, software modeling, static code analysis, dynamiccode analysis, runtime monitoring, inference, simulation, deeplearning

I. I

NTRODUCTION

Modern software development aims to design and control thequality of software. Testing techniques, such as unit, integration,or system testing, and their automation via continuous inte-gration, provide a feasible and generally applicable approachfor software quality assurance. Software testing aims to ﬁndfaults in a program. However, tests can not localize the faultswithin a program’s source code. This is no issue for unit testingsince the tests are small enough (typically methods). However,fault localization for integration and system tests can becomea time-consuming task.Fault Localization (FL) is the task of automatically ﬁndingfaults in a program such that a developer or an automatedprocess can repair them. Finding a fault, i.e., the real cause ofan error, is a hard problem. Not only is it difﬁcult to distinguisha symptom (cascading error) from a cause (actual fault), butalso multiple faults can work in conjunction, complicating thelocalization process. The state-of-the-art FL techniques likeSpectrum-based Fault Localization (SBFL) [1], [2] traditionallyrank statements by their likelihood of containing a fault. Thisleads to localization weaknesses for complex faults that spanmultiple lines [3] (76% of faults) or that are caused by theomission of statements (30% of faults) [4], [5].We propose Fault Localization via Probabilistic SoftwareModeling (FL-PSM). PSM [6] builds a network of ProbabilisticModels (PMs) of the executables (e.g., methods in Java) in a program. We use the PMs built by PSM to locate the mostlikely fault location. FL-PSM is a dynamic approach usingeither the test-suite (such as test-based FL techniques) or theactual execution of a program to ﬁt each PM. PSM uses runtimedata to construct behavioral datasets with which it ﬁts the PMs.Then, runtime data from another program version, or failingtests, are used to ﬁnd the most likely fault location.II. R

UNNING E XAMPLE

We use as an illustrative example the

Nutrition Advisor that takes a person’s anthropometric measurements (e.g.,height and weight) and returns a piece of textual advicebased on the Body Mass Index (BMI). The class diagramin Figure 1 (left) shows the four classes of the NutritionAdvisor. The sequence diagram in Figure 1 (right) shows apossible runtime trace of a request handled by the program.The

Servlet handles requests ( handle() ) and initializesa

Person object. This

Person object is received by the

NutritionAdvisor.advice -method that extract the per-son’s height (168.59) and weight (69.54). Both values are theparameters for the

BmiService.bmi call that returns theBMI (24.466) with which a textual advice is returned ("Youare healthy, . . . "). III. B

ACKGROUND

Probabilistic Software Modeling [6] describes a methodologyfor transforming a program into a network of probabilisticmodels. It extracts a program’s structure represented byproperties, executables, and types (ﬁelds, methods, and classesin Java) along with their call dependencies to build a networkof probabilistic models. Every node in the network is a PMthat represents an executable. Each PM in the network isoptimized towards a program execution. These execution tracesare extracted from the system in its production environmentor triggered via tests.Each code element PM represents an executable (e.g., aJava method) in the program. Inputs are parameters, propertyreads, invocation return values, while outputs are the methodreturn value, property writes, and invocation parameters. Thedistinction between inputs and outputs exists only on a logicallevel for the program. However, the models themselves aremultivariate density estimators (unsupervised models) withno notion of input and output (joint model of all variables). a r X i v : . [ c s . S E ] M a r utritionAdvisor bmiService: BmiServiceadvice(person: Person): String Person height: floatweight: float

BmiService bmi(height: float, weight: float): float

Servlet handle(...) servlet: Servlet advisor: NutritionAdvisor person: Person bmiService: BmiService advice(person) height168.59weight69.54bmi(height=169.59, weight=69.54)24.466"You are healthy, try a ..." gender: String

Fig. 1. Class Diagram (left) and Sequence Diagram (right) of the Nutrition Advisor [6].

Each model can generate new observations that are similar tothe initially trained data, e.g., to generate likely or rare (butplausible) test data. Furthermore, each model can evaluate thelikelihood of a given observation (e.g., to evaluate the adequacyof given test data). This evaluation is relative to the runtimetrace that was used to ﬁt the model, e.g., a model based onproduction runtime will evaluate observations differently thana model based on tests.PMs in this work are Non-Volume Preserving Transforma-tions (NVPs) [7], [8], which are general and expressive ﬂow-based density estimators. Each NVP is built via neural networksthat learn a function that maps latent random variables (e.g.,Gaussian variables) to the data (runtime events). Evaluatingthe likelihood with NVPs is done by transforming the runtimeevents into the known Gaussian latent-space and computingthe Gaussian likelihood of the transformed events. More detailson PSM and NVPs are given by our previous work [6] andDinh [7], [8]. IV. A

PPROACH

FL-PSM is built upon PSM. The fault localization is basedon the likelihood evaluation of these models. Given is a null-model M null of an executable and either an alt-dataset D alt of runtime events or an alt-model M alt with which a datasetis generated. FL-PSM localizes faults by computing the meanlog-likelihood of D alt on M null and comparing it to a criticalvalue. More speciﬁcally, LL D alt = 1 N N (cid:88) i p M null (cid:0) D alti (cid:1) (1)computes the average log-likelihood where N is the numberof data points in D . Finally, LL D alt − LL D null < c (2)evaluates whether there exists a signiﬁcant difference betweenmodel and data. LL D null is the log-likelihood of M null to itselfand captures the inherent bias. The critical value c controlsfor Type-1 errors (false-positives) similar to other signiﬁcancetests, e.g., log(0 . indicates that 1 out of 1000 events isfalsely considered to be signiﬁcantly different from the model. V. P RELIMINARY S TUDY

This preliminary study shows how FL-PSM ﬁnds possiblefault locations. Given is the Nutrition Advisor to which 3000requests are made based on data from the NHANES [9] dataset.The resulting model is the null-model M null . Then we seededtwo errors in the Nutrition Advisor and collected the alt-datasets D alt and D alt . The ﬁrst error simulates a regression(between versions) caused by a typo in the Person constructorthat assigns -weight instead of weight to the ﬁeld. Thesecond error simulates an integration fault (within version)caused by the miscommunication between teams using differentmeasures. Team A that also built the null

Nutrition Advisor,computes the BMI in meters while Team B that revises theimplementation computes the BMI in inches.We used the computation from Section IV with a criticalvalue (i.e., false-positive rate) of c = log(0 . − . Thismeans log-likelihoods below − are signiﬁcantly divergingfrom the model. A. Regression Fault

Figure 2 shows the runtime behavior of a subset of codeelements of the

Person.init and

NutritionAdvisor.advice models.Table I lists the likelihood and signiﬁcance of these elementsalong with the multivariate model likelihood that considers allelements at once. The visualization of the code elements allowsdevelopers to see that there is a signiﬁcant difference betweenthe model and the observations. The constructor parameter init.weight is aligned with the model while the property writesto

Person.weight are clearly different. This difference is alsosigniﬁcant as Table I shows (rows 1 and 4). Other elementsin the same model are insigniﬁcantly different as both thevisualization and the table show.The difference propagates to the depending

NutritionAdvi-sor.advice method that reads the

Person.weight property (rows5 and 7). Also the invocation of the

BmiService.bmi indicatesthis signiﬁcant divergence (row 8).

B. Integration Fault

Figure 3 shows the runtime behavior of a subset of codeelements of the

Servlet.handleRequest calling

Person.init . Inthis case,

Servlet.handleRequest model evaluates parameters .000.020.040.06 140 160 180 den s i t y NutritionAdvisor.advice -100 0 den s i t y

100 150

Source

Model Observation

Person.initinit.height init.weight Person.weightPerson.height Person.weight BmiService.bmi

Fig. 2. A subset of elements in the regression fault setting. For example, the ﬁrst row shows

Person.init parameters and property writes between the originalversion and a regressed version of the same component.

TABLE I. L IKELIHOOD VALUES OF A SUBSET OF ELEMENTS IN THEREGRESSION FAULT SETTING .Model Element Cardinality LL Sig1 Person.init init multivariate -6787 (cid:52) (cid:54) (cid:54) (cid:52) (cid:52) (cid:54) (cid:52) (cid:52) or return values of

Person.init . The visualization shows nosigniﬁcant difference in the integration between the

Servlet and

Person . This insigniﬁcance is also given in Table II. The inte-gration between

NutritionAdvisor.advice and

BmiService.bmi ,with the ﬁrst being the model, shows a difference in the returnvalue of

BmiService.bmi . Again, this difference is also reﬂectedin Table II (rows 5 and 8).VI. D

ISCUSSION

The preliminary study showed how FL-PSM localizes faults.This localization is automated via likelihood-based signiﬁcancetests that allow for statistical control of the false-positive rate.The other important aspect is the visualization of the faults(Figures 2 and 3) and its impact on dependent elements. Thisallows for precise analysis of the error chain and its inﬂuenceacross the program.FL-PSM can only be applied if there is at least a version ofthe program. This is not an issue from an industrial point of

TABLE II. L IKELIHOOD VALUES OF A SUBSET OF ELEMENT IN THEINTEGRATION FAULT SETTING .Model Element Cardinality LL Sig.1 Servlet.handle Person.init multivariate 0 (cid:54) (cid:54) (cid:54) (cid:54) (cid:52) (cid:54) (cid:54) (cid:52) view since FL-PSM can be used after a few development sprints.Another consideration is that FL-PSM localizes behavioralchanges, including intended changes. These intended changescan be ﬁltered by incorporating source code change informationin the localization process. In addition, the visualizationcapabilities of FL-PSM allow for quick manual inspections incases of doubt.In summary, the results and usability of FL-PSM are promis-ing. Nevertheless, there are still open questions concerningmultiple fault sources and their clear separation.VII. R

ELATED W ORK

Most fault localization techniques are slice, spectrum, statis-tics, model, or machine learning-based [1], [3].The most similar technique to FL-PSM is Spectrum-BasedFault Localization (SBFL) [1]. SBFL techniques observepassing and failing executions and perform statistical inferenceon the results. The result is a ranked list of statements, along .000.020.04 140 160 180 den s i t y NutritionAdvisor.advice -- BmiService.bmi0.000.010.020.03 40 80 120 160 den s i t y m a ss Source

Model Observation

Servlet.handleRequest -- Person.initinit.height init.weight init.genderbmi.height bmi.weight bmi.return

Fig. 3. A subset of elements in the integration fault setting. For example, the ﬁrst row shows

Person.init parameter values caused by the invocation from

Servlet.handleRequest . with their likelihood of being the fault location. While similar,FL-PSM works slightly differently in terms of the abstractionlevel and source model. PSM abstracts statements and onlyconsiders properties, executables, and types along with theircall dependencies. In contrast, SBFL techniques predominatelywork on the statement level. This might seem like a drawbackat ﬁrst. However, Parnin and Orso [5] identiﬁed that the detailof the results in combination with high false-positive rates areone of the main issues of the low industrial adoption of SBFL.PSM improves on these issues by providing control of thefalse-positive rate and its level of abstraction (executables).VIII. C ONCLUSION AND F UTURE W ORK

We presented Fault Localization via Probabilistic SoftwareModeling (FL-PSM). FL-PSM builds upon PSM and usesstatistical inference to ﬁnd possible fault locations in a program.The localization is based on evaluating the likelihood of runtimeevents under the model. We have shown how FL-PSM localizesand visualizes faults. In addition, we discussed the differencebetween FL-PSM and its close relative SBFL.Future work will focus on a full evaluation of the approachwith multiple complex subsystems. Furthermore, we want toconduct a user study for its practicality and applicability.In conclusion, FL-PSM is a promising new FL approach builtupon PSM that provides a general framework for probabilisticanalysis of software programs.A

CKNOWLEDGMENTS

The research reported in this paper has been supported by theAustrian ministries BMVIT and BMDW, and the Province of Upper Austria in terms of the COMET - Competence Centersfor Excellent Technologies Programme managed by FFG.R

EFERENCES[1] J. Jones, M. Harrold, and J. Stasko, “Visualization of test informationto assist fault localization,” in

Proceedings of the 24th InternationalConference on Software Engineering. ICSE 2002 , May 2002, pp. 467–477.[2] W. E. Wong, V. Debroy, R. Gao, and Y. Li, “The DStar Method forEffective Software Fault Localization,”

IEEE Transactions on Reliability ,vol. 63, no. 1, pp. 290–308, Mar. 2014.[3] W. E. Wong, R. Gao, Y. Li, R. Abreu, and F. Wotawa, “A Survey onSoftware Fault Localization,”

IEEE Transactions on Software Engineering ,vol. 42, no. 8, pp. 707–740, Aug. 2016.[4] S. Pearson, J. Campos, R. Just, G. Fraser, R. Abreu, M. D. Ernst, D. Pang,and B. Keller, “Evaluating and Improving Fault Localization,” in ,May 2017, pp. 609–620.[5] C. Parnin and A. Orso, “Are automated debugging techniques actuallyhelping programmers?” in

Proceedings of the 2011 International Sympo-sium on Software Testing and Analysis - ISSTA ’11 . Toronto, Ontario,Canada: ACM Press, 2011, p. 199.[6] H. Thaller, L. Linsbauer, R. Ramler, and A. Egyed, “ProbabilisticSoftware Modeling: A Data-driven Paradigm for Software Analysis,” arXiv:1912.07936 [cs] , Dec. 2019.[7] L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using RealNVP,” arXiv:1605.08803 [cs, stat] , May 2016.[8] G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, andB. Lakshminarayanan, “Normalizing Flows for Probabilistic Modelingand Inference,” arXiv:1912.02762 [cs, stat] , Dec. 2019.[9] CDC, “National Health and Nutrition Examination Survey Data,”