Towards Fault Localization via Probabilistic Software Modeling
Hannes Thaller, Lukas Linsbauer, Alexander Egyed, Stefan Fischer
TTowards Fault Localizationvia Probabilistic Software Modeling
Hannes Thaller, Lukas Linsbauer, Alexander Egyed
Institute for Software Systems EngineeringJohannes Kepler University Linz, Austria{hannes.thaller, lukas.linsbauer, alexander.egyed}@jku.at
Stefan Fischer
Software Competence Center Hagenberg GmbHAustriastefan.fi[email protected]
Abstract —Software testing helps developers to identify bugs.However, awareness of bugs is only the first step. Finding andcorrecting the faulty program components is equally hard andessential for high-quality software. Fault localization automati-cally pinpoints the location of an existing bug in a program. It isa hard problem, and existing methods are not yet precise enoughfor widespread industrial adoption. We propose fault localizationvia Probabilistic Software Modeling (PSM). PSM analyzes thestructure and behavior of a program and synthesizes a networkof Probabilistic Models (PMs). Each PM models a method withits inputs and outputs and is capable of evaluating the likelihoodof runtime data. We use this likelihood evaluation to find faultlocations and their impact on dependent code elements. Resultsindicate that PSM is a robust framework for accurate faultlocalization.
Index Terms —fault localization, probabilistic modeling, multi-variate testing, software modeling, static code analysis, dynamiccode analysis, runtime monitoring, inference, simulation, deeplearning
I. I
NTRODUCTION
Modern software development aims to design and control thequality of software. Testing techniques, such as unit, integration,or system testing, and their automation via continuous inte-gration, provide a feasible and generally applicable approachfor software quality assurance. Software testing aims to findfaults in a program. However, tests can not localize the faultswithin a program’s source code. This is no issue for unit testingsince the tests are small enough (typically methods). However,fault localization for integration and system tests can becomea time-consuming task.Fault Localization (FL) is the task of automatically findingfaults in a program such that a developer or an automatedprocess can repair them. Finding a fault, i.e., the real cause ofan error, is a hard problem. Not only is it difficult to distinguisha symptom (cascading error) from a cause (actual fault), butalso multiple faults can work in conjunction, complicating thelocalization process. The state-of-the-art FL techniques likeSpectrum-based Fault Localization (SBFL) [1], [2] traditionallyrank statements by their likelihood of containing a fault. Thisleads to localization weaknesses for complex faults that spanmultiple lines [3] (76% of faults) or that are caused by theomission of statements (30% of faults) [4], [5].We propose Fault Localization via Probabilistic SoftwareModeling (FL-PSM). PSM [6] builds a network of ProbabilisticModels (PMs) of the executables (e.g., methods in Java) in a program. We use the PMs built by PSM to locate the mostlikely fault location. FL-PSM is a dynamic approach usingeither the test-suite (such as test-based FL techniques) or theactual execution of a program to fit each PM. PSM uses runtimedata to construct behavioral datasets with which it fits the PMs.Then, runtime data from another program version, or failingtests, are used to find the most likely fault location.II. R
UNNING E XAMPLE
We use as an illustrative example the
Nutrition Advisor that takes a person’s anthropometric measurements (e.g.,height and weight) and returns a piece of textual advicebased on the Body Mass Index (BMI). The class diagramin Figure 1 (left) shows the four classes of the NutritionAdvisor. The sequence diagram in Figure 1 (right) shows apossible runtime trace of a request handled by the program.The
Servlet handles requests ( handle() ) and initializesa
Person object. This
Person object is received by the
NutritionAdvisor.advice -method that extract the per-son’s height (168.59) and weight (69.54). Both values are theparameters for the
BmiService.bmi call that returns theBMI (24.466) with which a textual advice is returned ("Youare healthy, . . . "). III. B
ACKGROUND
Probabilistic Software Modeling [6] describes a methodologyfor transforming a program into a network of probabilisticmodels. It extracts a program’s structure represented byproperties, executables, and types (fields, methods, and classesin Java) along with their call dependencies to build a networkof probabilistic models. Every node in the network is a PMthat represents an executable. Each PM in the network isoptimized towards a program execution. These execution tracesare extracted from the system in its production environmentor triggered via tests.Each code element PM represents an executable (e.g., aJava method) in the program. Inputs are parameters, propertyreads, invocation return values, while outputs are the methodreturn value, property writes, and invocation parameters. Thedistinction between inputs and outputs exists only on a logicallevel for the program. However, the models themselves aremultivariate density estimators (unsupervised models) withno notion of input and output (joint model of all variables). a r X i v : . [ c s . S E ] M a r utritionAdvisor bmiService: BmiServiceadvice(person: Person): String Person height: floatweight: float
BmiService bmi(height: float, weight: float): float
Servlet handle(...) servlet: Servlet advisor: NutritionAdvisor person: Person bmiService: BmiService advice(person) height168.59weight69.54bmi(height=169.59, weight=69.54)24.466"You are healthy, try a ..." gender: String
Fig. 1. Class Diagram (left) and Sequence Diagram (right) of the Nutrition Advisor [6].
Each model can generate new observations that are similar tothe initially trained data, e.g., to generate likely or rare (butplausible) test data. Furthermore, each model can evaluate thelikelihood of a given observation (e.g., to evaluate the adequacyof given test data). This evaluation is relative to the runtimetrace that was used to fit the model, e.g., a model based onproduction runtime will evaluate observations differently thana model based on tests.PMs in this work are Non-Volume Preserving Transforma-tions (NVPs) [7], [8], which are general and expressive flow-based density estimators. Each NVP is built via neural networksthat learn a function that maps latent random variables (e.g.,Gaussian variables) to the data (runtime events). Evaluatingthe likelihood with NVPs is done by transforming the runtimeevents into the known Gaussian latent-space and computingthe Gaussian likelihood of the transformed events. More detailson PSM and NVPs are given by our previous work [6] andDinh [7], [8]. IV. A
PPROACH
FL-PSM is built upon PSM. The fault localization is basedon the likelihood evaluation of these models. Given is a null-model M null of an executable and either an alt-dataset D alt of runtime events or an alt-model M alt with which a datasetis generated. FL-PSM localizes faults by computing the meanlog-likelihood of D alt on M null and comparing it to a criticalvalue. More specifically, LL D alt = 1 N N (cid:88) i p M null (cid:0) D alti (cid:1) (1)computes the average log-likelihood where N is the numberof data points in D . Finally, LL D alt − LL D null < c (2)evaluates whether there exists a significant difference betweenmodel and data. LL D null is the log-likelihood of M null to itselfand captures the inherent bias. The critical value c controlsfor Type-1 errors (false-positives) similar to other significancetests, e.g., log(0 . indicates that 1 out of 1000 events isfalsely considered to be significantly different from the model. V. P RELIMINARY S TUDY
This preliminary study shows how FL-PSM finds possiblefault locations. Given is the Nutrition Advisor to which 3000requests are made based on data from the NHANES [9] dataset.The resulting model is the null-model M null . Then we seededtwo errors in the Nutrition Advisor and collected the alt-datasets D alt and D alt . The first error simulates a regression(between versions) caused by a typo in the Person constructorthat assigns -weight instead of weight to the field. Thesecond error simulates an integration fault (within version)caused by the miscommunication between teams using differentmeasures. Team A that also built the null
Nutrition Advisor,computes the BMI in meters while Team B that revises theimplementation computes the BMI in inches.We used the computation from Section IV with a criticalvalue (i.e., false-positive rate) of c = log(0 . − . Thismeans log-likelihoods below − are significantly divergingfrom the model. A. Regression Fault
Figure 2 shows the runtime behavior of a subset of codeelements of the
Person.init and
NutritionAdvisor.advice models.Table I lists the likelihood and significance of these elementsalong with the multivariate model likelihood that considers allelements at once. The visualization of the code elements allowsdevelopers to see that there is a significant difference betweenthe model and the observations. The constructor parameter init.weight is aligned with the model while the property writesto
Person.weight are clearly different. This difference is alsosignificant as Table I shows (rows 1 and 4). Other elementsin the same model are insignificantly different as both thevisualization and the table show.The difference propagates to the depending
NutritionAdvi-sor.advice method that reads the
Person.weight property (rows5 and 7). Also the invocation of the
BmiService.bmi indicatesthis significant divergence (row 8).
B. Integration Fault
Figure 3 shows the runtime behavior of a subset of codeelements of the
Servlet.handleRequest calling
Person.init . Inthis case,
Servlet.handleRequest model evaluates parameters .000.020.040.06 140 160 180 den s i t y NutritionAdvisor.advice -100 0 den s i t y
100 150
Source
Model Observation
Person.initinit.height init.weight Person.weightPerson.height Person.weight BmiService.bmi
Fig. 2. A subset of elements in the regression fault setting. For example, the first row shows
Person.init parameters and property writes between the originalversion and a regressed version of the same component.
TABLE I. L IKELIHOOD VALUES OF A SUBSET OF ELEMENTS IN THEREGRESSION FAULT SETTING .Model Element Cardinality LL Sig1 Person.init init multivariate -6787 (cid:52) (cid:54) (cid:54) (cid:52) (cid:52) (cid:54) (cid:52) (cid:52) or return values of
Person.init . The visualization shows nosignificant difference in the integration between the
Servlet and
Person . This insignificance is also given in Table II. The inte-gration between
NutritionAdvisor.advice and
BmiService.bmi ,with the first being the model, shows a difference in the returnvalue of
BmiService.bmi . Again, this difference is also reflectedin Table II (rows 5 and 8).VI. D
ISCUSSION
The preliminary study showed how FL-PSM localizes faults.This localization is automated via likelihood-based significancetests that allow for statistical control of the false-positive rate.The other important aspect is the visualization of the faults(Figures 2 and 3) and its impact on dependent elements. Thisallows for precise analysis of the error chain and its influenceacross the program.FL-PSM can only be applied if there is at least a version ofthe program. This is not an issue from an industrial point of
TABLE II. L IKELIHOOD VALUES OF A SUBSET OF ELEMENT IN THEINTEGRATION FAULT SETTING .Model Element Cardinality LL Sig.1 Servlet.handle Person.init multivariate 0 (cid:54) (cid:54) (cid:54) (cid:54) (cid:52) (cid:54) (cid:54) (cid:52) view since FL-PSM can be used after a few development sprints.Another consideration is that FL-PSM localizes behavioralchanges, including intended changes. These intended changescan be filtered by incorporating source code change informationin the localization process. In addition, the visualizationcapabilities of FL-PSM allow for quick manual inspections incases of doubt.In summary, the results and usability of FL-PSM are promis-ing. Nevertheless, there are still open questions concerningmultiple fault sources and their clear separation.VII. R
ELATED W ORK
Most fault localization techniques are slice, spectrum, statis-tics, model, or machine learning-based [1], [3].The most similar technique to FL-PSM is Spectrum-BasedFault Localization (SBFL) [1]. SBFL techniques observepassing and failing executions and perform statistical inferenceon the results. The result is a ranked list of statements, along .000.020.04 140 160 180 den s i t y NutritionAdvisor.advice -- BmiService.bmi0.000.010.020.03 40 80 120 160 den s i t y m a ss Source
Model Observation
Servlet.handleRequest -- Person.initinit.height init.weight init.genderbmi.height bmi.weight bmi.return
Fig. 3. A subset of elements in the integration fault setting. For example, the first row shows
Person.init parameter values caused by the invocation from
Servlet.handleRequest . with their likelihood of being the fault location. While similar,FL-PSM works slightly differently in terms of the abstractionlevel and source model. PSM abstracts statements and onlyconsiders properties, executables, and types along with theircall dependencies. In contrast, SBFL techniques predominatelywork on the statement level. This might seem like a drawbackat first. However, Parnin and Orso [5] identified that the detailof the results in combination with high false-positive rates areone of the main issues of the low industrial adoption of SBFL.PSM improves on these issues by providing control of thefalse-positive rate and its level of abstraction (executables).VIII. C ONCLUSION AND F UTURE W ORK
We presented Fault Localization via Probabilistic SoftwareModeling (FL-PSM). FL-PSM builds upon PSM and usesstatistical inference to find possible fault locations in a program.The localization is based on evaluating the likelihood of runtimeevents under the model. We have shown how FL-PSM localizesand visualizes faults. In addition, we discussed the differencebetween FL-PSM and its close relative SBFL.Future work will focus on a full evaluation of the approachwith multiple complex subsystems. Furthermore, we want toconduct a user study for its practicality and applicability.In conclusion, FL-PSM is a promising new FL approach builtupon PSM that provides a general framework for probabilisticanalysis of software programs.A
CKNOWLEDGMENTS
The research reported in this paper has been supported by theAustrian ministries BMVIT and BMDW, and the Province of Upper Austria in terms of the COMET - Competence Centersfor Excellent Technologies Programme managed by FFG.R
EFERENCES[1] J. Jones, M. Harrold, and J. Stasko, “Visualization of test informationto assist fault localization,” in
Proceedings of the 24th InternationalConference on Software Engineering. ICSE 2002 , May 2002, pp. 467–477.[2] W. E. Wong, V. Debroy, R. Gao, and Y. Li, “The DStar Method forEffective Software Fault Localization,”
IEEE Transactions on Reliability ,vol. 63, no. 1, pp. 290–308, Mar. 2014.[3] W. E. Wong, R. Gao, Y. Li, R. Abreu, and F. Wotawa, “A Survey onSoftware Fault Localization,”
IEEE Transactions on Software Engineering ,vol. 42, no. 8, pp. 707–740, Aug. 2016.[4] S. Pearson, J. Campos, R. Just, G. Fraser, R. Abreu, M. D. Ernst, D. Pang,and B. Keller, “Evaluating and Improving Fault Localization,” in ,May 2017, pp. 609–620.[5] C. Parnin and A. Orso, “Are automated debugging techniques actuallyhelping programmers?” in
Proceedings of the 2011 International Sympo-sium on Software Testing and Analysis - ISSTA ’11 . Toronto, Ontario,Canada: ACM Press, 2011, p. 199.[6] H. Thaller, L. Linsbauer, R. Ramler, and A. Egyed, “ProbabilisticSoftware Modeling: A Data-driven Paradigm for Software Analysis,” arXiv:1912.07936 [cs] , Dec. 2019.[7] L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using RealNVP,” arXiv:1605.08803 [cs, stat] , May 2016.[8] G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, andB. Lakshminarayanan, “Normalizing Flows for Probabilistic Modelingand Inference,” arXiv:1912.02762 [cs, stat] , Dec. 2019.[9] CDC, “National Health and Nutrition Examination Survey Data,”