[PDF] An Analysis of Bug Distribution in Object Oriented Systems

Abstract

We introduced a new approach to describe Java software as graph, where nodes represent a Java file - called compilation unit (CU) - and an edges represent a relations between them. The software system is characterized by the degree distribution of the graph properties, like in-or-out links, as well as by the distribution of Chidamber and Kemerer metrics computed on its CUs. Every CU can be related to one or more bugs during its life. We find a relationship among the software system and the bugs hitting its nodes. We found that the distribution of some metrics, and the number of bugs per CU, exhibit a power-law behavior in their tails, as well as the number of CUs influenced by a specific bug. We examine the evolution of software metrics across different releases to understand how relationships among CUs metrics and CUs faultness change with time.

Full PDF

aa r X i v : . [ c s . S E ] M a y An Analysis of BugDistribution in Ob jectOriented Systems

Alessandro Murgia ∗ , Giulio Concas † ,Michele Marchesi ‡ , Roberto Tonelli § andIvana Turnu ¶ . Department of Electrical and Electronic Engineering, University of Cagliari, piazza d’Armi,09123 Cagliari, Italy.

SUMMARYWe introduced a new approach to describe Java software as graph, where nodes representa Java ﬁle - called compilation unit (CU) - and an edges represent a relations betweenthem. The software system is characterized by the degree distribution of the graphproperties, like in-or-out links, as well as by the distribution of Chidamber and Kemerermetrics computed on its CUs. Every CU can be related to one or more bugs during itslife. We ﬁnd a relationship among the software system and the bugs hitting its nodes.We found that the distribution of some metrics, and the number of bugs per CU, exhibita power-law behavior in their tails, as well as the number of CUs inﬂuenced by a speciﬁcbug. We examine the evolution of software metrics across diﬀerent releases to understandhow relationships among CUs metrics and CUs faultness change with time. key words:

Software graphs, object-oriented programming, statistical methods, complexity measures,software metrics, bug distribution.

1. INTRODUCTION

Large software systems can be analysed as graphs so huge and intricate that can be studiedusing complex network theory.In the case of object oriented (OO) software systems nodes are the classes or the interfaces, and ∗ E-mail: [email protected] † E-mail: [email protected] ‡ E-mail: [email protected] § E-mail: [email protected] ¶ E-mail: [email protected] oriented edges are the various kinds of relationships between them, inheritance, composition,dependence. For OO systems there exist also some consolidated software metrics, alsoassociated to the graph, usually computed at class level, the most used being the Chidamberand Kemerer (CK) suite of metrics [1]. The relationship between metrics and software qualityis fuzzy, and is still the subject of ongoing research.Related to software quality are software bugs. Several researchers analysed software evolutionin order to understand the relationship between software management and bug issues.Purushothaman et al. [2] analyzed software development process to identify what are therelationships between small changes to the code and bug growth. Kim et al. [3] analyzedmicro-pattern evolution in Java classes to identify which of them is more bug-prone. ´Sliwerskiet al. [4] analyzed the ﬁx-inducing changes, i.e. software updates that trigger the appearance ofbugs. In their work, the revision history associated to compilation units (CUs) was examinedto understand where bugs issues are introduced during CU evolution. Compilation units, thebasic blocks examined in this paper, are ﬁles containing one or more classes, for which it ispossible to compute software metrics similar to those used for classes.A complete analysis of the relationships between graph properties of large software systems,statistic of software metrics, and the introduction and distribution of bugs in such graphsis, to our knowledge, completely missing. Zimmerman et al. considered a network analysison dependences graphs, built on binary ﬁles [5], and how dependencies correlate with, andpredict, defects. Andersson et al. [6] discussed the Pareto distribution of bugs in classes,without entering into the details of the statistical properties of software which determine suchdistribution. Zhang found that the bug distribution across compilation packages in EclipseJava system seems to follow a Weibull distribution [7].The aim of this paper is study OO systems using complex network theory, to improve theknowledge of bugs causes and to statistically determine their distribution into the system. Weextend the deﬁnitions of CK software metrics to CUs to understand the evolution of faultness,i.e. how a metric variation aﬀects the number of bugs hitting a CU. A deeper understanding ofthe dynamics of software development could be useful for software engineers to identify whichsystem components will be more prone to bugs, thus focusing testing and code reviews onthese components.We also study the time evolution of software systems and of the related graphs and metrics,analysing both the source code and the bugs of various releases of two large Java systems,Eclipse [8] and Netbeans [9]. For each release we computed the associated software graph andthe CK metrics for each class. Furthermore, we study the number of defects associated to CUs,as found in the bug-tracking system used for development.We computed the correlation between OO metrics and bugs and analyzed the evolution ofthese metrics between one release and the next, correlating metrics changes with the numberof defects. We present a scheme of classiﬁcation of CUs into categories which allows us toidentify which parts of the software are the most fault-prone, and how these are correlated toCK software metrics. We support our ﬁndings with signiﬁcance tests.0

2. Method

We analyze the source code of object-oriented systems written in Java. Both use CVS as versioncontrol system. Eclipse uses Bugzilla as issue tracker system, while Netbeans uses Issuezilla.The CVS keeps track of the source code history, Bugzilla and Issuezilla keep track of the bugshistory.

An oriented graph is associated to OO software systems, where the nodes are the classesand the interfaces, and the edges are the relationships between classes, namely inheritance,composition and dependence.The number and orientation of edges allow to study the coupling between nodes. In this graphthe in-degree of a class is the number of edges directed toward the class, and measures howmuch this class is used by other classes of the system. The out-degree of a class is the numberof edges leaving the class, and represents the level of usage the class makes of other classes inthe system. In this context CK suite is a common metrics employed in classes analysis. Wecalculated for each node the values of the four most relevant CK metrics of the associatedclass: • Weighted Methods per Class (WMC). A weighted sum of all the methods deﬁned in aclass. We set the weighting factor to one to simplify our analysis. • Coupling Between Objects (CBO). The counting of the number of classes which a givenclass is coupled to. • Response For a Class (RFC). The sum of the number of methods deﬁned in the class, andthe cardinality of the set of methods called by them and belonging to external classes. • Lack of Cohesion of Methods (LCOM). The diﬀerence between the number of noncohesive method pairs and the number of cohesive pairs.We also computed the lines of code of the class (LOC), excluding blanks and comment lines.This is useful to keep track of CU dimension because it is known that a ”long” class is morediﬃcult to menage than a short class.Every system class resides inside a Java ﬁle, called CU. While most ﬁles include just oneclass, there are ﬁles including more than one class. In Eclipse 10% of CUs host more than oneclass, whereas in Netbeans this percentage is 30%. In commit messages issues and issue ﬁxingalways refer to CUs. To make consistent issue tracking with source code, we decided to extendCK metrics from classes to CUs. CUs represent therefore the main element of our study. So,we deﬁned a CU graph whose nodes are the CUs of the system. Two nodes are connectedwith a directed edge if at least one class inside the CU associated with the ﬁrst node has adependency relationship with one class inside the CU associated with the second node. Werefer to this graph for computing in-links and out-links of a CU-node. We reinterpreted CKmetrics onto this CU-graph: • CU LOCS is the sum of the LOCS of classes contained in the CU;0

Onto the CU graph we look for nodes hit by Issues. To obtain this information it is necessaryto check the CVS log ﬁle, and the data contained in the ITS.We consider a CU as aﬀected by an Issue when it is modiﬁed for issue ﬁxing. Developers recordon the CVS log all ﬁxing activities. All commit operations are tracked in the CVS log as singleentries. Each entry contains various data, among which the date, the developer who made thechanges, an annotation referring to the reasons of the commit, and the list of CUs interestedby the commit. In case of commits associated to an issue ﬁxing activity, this is written in theannotation, though not in a standardized way. It is not simple to obtain a correct mappingbetween issue(s) and the related CU(s) [4] [10].In our approach, we ﬁrst analyzed the CVS log, to locate commit messages associated toﬁxing activities. Then, the extracted data are matched with information found in the ITS.Each issue is identiﬁed by a whole positive number (ID). In commit messages it can appear astring such as ”Fixed 141181” or ”bug 0

3. Results

The subjects of our study were Eclipse and Netbeans projects, both open source, objectoriented, Java based systems. Table I and II show the number of CUs involved in the mainreleases of Eclipse and Netbeans, respectively.Table II: Number of CUs of Netbeans for each main releaseRelease 3.2 3.3 3.5 3.6 4.0 5.0 6.0Number of CU 3350 4421 7391 8350 9365 12137 37145A software system usually evolves through subsequent releases . Main releases entailsubstantial enhancements of the system, and are usually characterized by signiﬁcant changesin software sizes, as demonstrated by the data reported in Tables I and II. Between two mainreleases there may be diﬀerent “patching releases”, intended to ﬁx bugs and to provide minorenhancements. Even if we analyzed all the releases, we report results for the main releases andthe patching release immediately preceding the next main release. In fact most of bugs areintroduced in upgrading from the last patching release to the next main release.

We computed the statistical distributions of software metrics underlying the software graph.We compared the metrics for software graphs built using classes as basic units, already observedin literature, with the ones obtained in this work for software graphs built considering CUs.The latter distributions substantially keep the ”fat-tail” behavior of the corresponding classmetrics [11] in all cases. Fig. 1 reports the log-log plot of the complementary cumulativedistribution functions (CCDF) of CBO metric of Eclipse 3.2 for classes and for CUs.Fig. 2 reports the CCDF of CBO metrics, this time referred to Netbeans 4.0. All thesedistributions exhibit a power-law behavior in their tail.We recall that a quantity x obeys a power law if it is drawn from a probability distributionproportional to a negative power of x : p ( x ) ∝ x − γ where γ > . (1) γ is the power-law coeﬃcient, known also as the exponent or scaling parameter . Thecorresponding complementary cumulative distribution function (CCDF), i.e. the probability0 :0–0 −4 −3 −2 −1 x P r( X > x ) *CBO Compilation Units +CBO Classes Figure 1: The CCDF of CBO metricsfor classes (crosses) and CUs (stars)in Eclipse 3.2 −5 −4 −3 −2 −1 x P r( X > x ) *CBO Compilation Units +CBO Classes Figure 2: The CCDF of CBO metricsfor classes (crosses) and CUs (stars)in Netbeans 4.0.that the random variable is greater than a given value x , is: P ( X ≥ x ) ∝ x − ( γ − (2)A power-law, or Pareto, distribution cannot hold for x = 0, so eligible values of x must begreater than a positive number x min . This characteristic allows to consider distributions thatare power-laws only in their right ”tail”, that is for x greater than a given value x min , andnot for lower values of x . All the distributions shown in Figs. 1, 3 and 4 show a straight linebehavior in their right tail. Note that the CCDF has the same analytical expression of thedistribution function, with a negative exponent oﬀset by one. Plotting p ( x ) or P ( x ) in log-logscale one obtains a straight line, as shown in Figs. 1 and 2.Fig. 3 and 4 show the CCDF of WMC metric in Eclipse 3.2 and in Netbeans 4.0, respectively.These distributions are also quite similar, and present again in their tail a power-law behavior,both for classes and for CUs. We found this behavior also for all other releases, and for allmetrics.The ﬁnding that the distributions of CU metrics largely coincide with those of thecorresponding metrics of classes suggests that the same considerations that are valid for CUsmay be extended also to classes, even in the cases where data for the classes are not directlyaccessible, like in our case for bugs. One goal of this paper is, in fact, to ﬁnd, by means ofthe software graph framework, existing correlations among bugs and metrics. Thus, since buginformation for classes is not directly detectable from the repository, we analyzed the bugsmetric only for CUs, and use this information to obtain clues about classes.Fig. 5 shows the CCDF of the number of bugs per CU in Eclipse 3.2. Fig. 6 shows the samedistribution in Netbeans 3.4. The meaning of these power-law tail distributions is unequivocal.While most CUs present only very few bugs, there is a non-negligible number of CUs withvery many bugs. We also found similar shapes (patterns) in all other main releases.0 :0–0 −4 −3 −2 −1 x P r( X > x ) *WMC Compilation Units +WMC Classes Figure 3: The CCDF of WMC metricsfor classes (crosses) and CUs (stars) inEclipse 3.2. −5 −4 −3 −2 −1 x P r( X > x ) *WMC Compilation Units +WMC Classes Figure 4: The CCDF of WMC metricsfor classes (crosses) and CUs (stars) inNetbeans 4.0. −4 −3 −2 −1 x P r( X > x ) bugs count Figure 5: The CCDF of the numberof bugs per CU in Eclipse 3.2. −4 −3 −2 −1 x P r( X > x ) bugs count Figure 6: The CCDF of the numberof bugs per CU in Netbeans 3.4.0 :0–0 On the basis of these similarities, the hypothesis that the power-laws existing for bugdistribution among CUs may be extended to classes, as well as to other units, like modules orpackages, and that it is a property of the graph structure of the system looks sensible.In fact similar results were obtained by Andersson et Runeson [6], and by Zhang [7].Andersson et Runeson suggest a Pareto law governing the distribution of bugs across basicunits of a software system only partially OO, showing that few modules contain most of thebugs (the 20-80 rule [12]). Zhang re-examined their results for the Eclipse software system,ﬁnding that a Weibull distribution ﬁts data better than a power-law, studying packages insteadof modules. Since the tail of a Weibull distribution is often not distinguishable from a power-law tail, their results support our hypothesis.Let us point out what we consider our most relevant ﬁnding. We veriﬁed that a power-lawdistribution may be appropriate to describe the fat-tail distribution of diﬀerent quantities.Note that the fat-tail contains the software units to which most of the information belongs.When a metric is distributed according to a power-law, even only in its tail, with a scalingexponent small enough, there are relatively few units with highest values of the metrics, wherecriticality resides, while most other units are much less critical. The 80-20 Pareto principle isa consequence of that: about 80% of the criticality is held in 20% of all units.Our analysis is ﬁner than those performed in [6] or in [7], in the sense that we analyzed thesoftware structure and relationships at the level of compilation units, one level deeper thanthe module or the package level presented in the above works. This allowed us to recoverﬁner information on the distributions of metrics, especially in their tail. Our results conﬁrmthose of Andersson and Runeson, and of Zhang, showing that the same framework holds atdiﬀerent scales, exhibiting a scale-free structure [13]. This ﬁnding qualitatively supports theuse of power-laws. Finally, also Louridas et al. [14], show a large variety of cases in whichpower-laws well account for the distribution of diﬀerent software properties.Regarding the value of the exponent γ and the corresponding behavior of the number of bugsper CU, this value tends to be between 2.5 and 3.5 in the various releases examined for bothEclipse and Netbeans.According to ref. [14], a mathematical description of the fat-tail may have relevantconsequences on software engineering, for example in helping to carefully select which parts ofthe software project are worth of more care and eﬀort, also from an economical point of view.For instance, given n modules characterized by a metric distributed according to a power-lawwith exponent γ , the average maximum expected value for this metric in the module withhighest metric value, < xmax > , is given by the formula [15] < x max > = n / ( γ − (3)This formula provides a deﬁnite expectation of the maximum value taken by the metric, andhence allows to ﬂag speciﬁc modules with metric value of this order of magnitude.We studied also the distributions of the number of CUs hit by a single bug, the dual of thedistribution of bugs across CUs. Also in this case, we ﬁnd a power-law, as shown in Figs. 7and 8 for Eclipse and Netbeans, respectively. This means that, while most bugs aﬀect just oneor a few CUs, there are bugs that aﬀect tens, or ever hundreds of CUs.0

Figure 7: The CCDF of the number ofCUs associated to each bug in Eclipse3.2. −4 −3 −2 −1 x P r( X > x ) Compilation Units

Figure 8: The CCDF of the numberof CUs associated to each bug inNetbeans 3.4.The value of the exponent γ of the distributions of the number of CUs aﬀected by a bug isconsistently between 2.2 and 2.9 in all considered releases, for both Eclipse and Netbeans,meaning an ever “fatter” tail of this distribution with respect to the previously studieddistribution of bugs per CU.The ﬁnding that the distribution of bugs across CUs satisﬁes a power-law, may suggest amodel for the introduction and the spread of bugs in the software system. We already speciﬁedthat, in our investigation, we name “bug” each numerical identiﬁer found in the repositoryassociated to software “ﬁxing”. Thus, generally speaking, a bug reported in a CU meansthat such a CU needed to be partially modiﬁed owing to this bug. Now, let us consider thegraph structure of the software system. We, and many other authors in literature, veriﬁed anorganized structure of such graphs, exibiting power-law distributions for many properties ofthe system. In particular, there are nodes linked with many other nodes, playing the role of”hubs” of the system. For example, there are few CUs with a large number of in-links, meaningthat they are extensively used by other CUs. If a bug hits such CUs, namely, the CU codeneed modiﬁcations, it is very likely that also the code of CUs linked to that node need to bemodiﬁed. Such mechanism may generate a sort of defect propagation in the software graph,very similar to the spread of a contagious disease. The system gets infected by bugs, and asingle bug may aﬀect many diﬀerent CUs, if it propagates from a hub node. On the contrary,bugs in CUs with very few links will likely remain conﬁned to a small number of CUs.Our heuristic conclusion is that the power-laws observed for the bug distribution is probablydue to the scale-free structure of the software graph. Bugs propagate inside a constrainingframework, which determines their diﬀusion across the software system.From the software engineering point of view, the usefulness of ﬁnding power-laws in the tail ofthe bugs distribution, may be illustrated following the reasoning of Louridas et al. [14]. Once0 :0–0 it is shown that bugs distribution across CUs is in the form of a power-law, CUs in the tailmay be identiﬁed as the most fault-prone. Thus, after the issue of a new release, the inspectionof CUs for bug detection may take advantage of this information. For instance, an inspectionof the highest 5 % ranked CUs would imply the inspection of a high percentage of bugs, werethe exact percentages is related to the power-law exponent. We analyzed, for each version of the system, the correlations between the considered softwaremetrics and the number of bugs. This information may be used to understand, from themeasure of the metric, which parts of the software are most aﬀected by faults, and to devisethe possible strategies to apply during software development in order to control metrics values,with the goal of reducing bug introduction.Our analysis started computing, for various releases R i of the system, the linear correlationbetween a particular CK metric and the number of bugs of the same CUs. This is only apreliminary analysis in order to identify which CU metrics are more related to fault proneness.We recall that developers distinguish between ”main” and ”patching” releases, and thatchanges from a main release to the next are usually relevant also regarding metrics.In the ﬁrst part of our study we referred to the main releases. In the Eclipse project mainreleases are identiﬁed by two-digit numbers, that is: Eclipse 2.1, Eclipse 3.0, Eclipse 3.1, Eclipse3.2, and Eclipse 3.3. We analyzed what can be deduced about bugs from the analysis of thesoftware metrics for this kind of releases.Table III shows the correlations between metrics and bugs for the main releases of Eclipse. Themetrics showing the highest correlation with bugs are those taking into account the number ofdependencies with other CUs, namely CBO and RFC. This fact highlights the importance ofan analysis of a software system as a graph. The out-links metric is less correlated with bugsthan CBO and RFC. Out-links metric includes not only dependency relationships, but alsoinheritance and implements relationships. A lower correlation of this metric with bugs may beinterpreted with a higher ability of dependency relationships of propagating bugs with respectto the other relationships.Table III: Pearson correlations between metrics and bugs for some releases of Eclipse:2.1 3.0 3.1 3.2 3.3bugs-LOCS 0.49 0.57 0.54 0.58 0.48bugs-CBO 0.55 0.53 0.55 0.55 0.42bugs-RFC 0.59 0.48 0.44 0.56 0.45bugs-WMC 0.48 0.45 0.38 0.48 0.40bugs-LCOM 0.30 0.21 0.15 0.34 0.24bugs-inliks 0.1 0.17 0.25 0.28 0.24bugs-outlinks 0.47 0.38 0.40 0.55 0.420 :0–0 R i of the Netbeans system. In Netbeans the distinction between mainand patching releases is fuzzier than in Eclipse; moreover there are various MR which are notfollowed by classic PR.A comparison of Tables III and IV shows that Netbeans correlation values among metrics andbugs number are usually lower than in Eclipse. However, in both systems, LOCS and RFCare the two most correlated metrics to the CU faultness, while LCOM shows, in both cases, aweak correlation to CU faultness.These results show that: • Given a release, there exist metrics that are more correlated to CU faultness than others; • Considering all releases, there is not one CK metric which is the most correlated for eachrelease; • Given a metric, its correlation with the number of bug changes release by release.0 :0–0 We also analyzed the evolution of the metrics between two consecutive releases. To this purposewe deﬁne diﬀerent types of CUs, distinguishing among updated, unmodiﬁed, newly introduced,and deﬁning all these types with respect to all the diﬀerent metrics.In particular, given a release R i , the next release R i +1 , and a metric M, we classiﬁed thecompilation units in four categories: • CU.X is the set of compilation units where metric M doesn’t change between R i and R i +1 ; • CU.U is the set of compilation units where metric M changes (Updated); • CU.A is the set of compilation units that exist in R i +1 but not in R i (Added);It must be pointed out that U and X categories are deﬁned relative to a speciﬁc metric. A CUmight exhibit a change in metric M but not in metric M’ between the releases R i and R i +1 .Thus, it will belong to class CU.U for M, and to class CU.X for M’. This case is not common,but it is deﬁnitely possible. CU.A is deﬁned regardless to any metric M, since it refers to CUsjust introduced in the new release. There are also CUs existing in release R i but not in release R i +1 . These deleted CUs are not considered in our study.Given the set of compilation units belonging to the three categories CU.U, CU.X, and CU.A,we compute: • the fraction of compilation unit aﬀected by bugs, which provides an infection probability; • the average number of bugs of the infected compilation units.In Table V we show the probability for CUs belonging to one of the families U, X and A, ofbeing infected, in various changes of releases.The probability that a CU belonging to family CU.U is infected is between 0.6 - 0.7 inEclipse. This means that there is a high probability that changing the LOCS, CBO, or LCOMmetrics of a CU from one release to the next results in injecting at least one error into thecompilation unit. This result conﬁrms Purushothaman’s study [2], that highlighted that codecorrection for defects often introduces new defects. Also the CUs added to the system, in thetransition from R i to R i +1 , show a high probability to be infected, clearly larger than forthe case of CUs not modiﬁed (set CU.X), and slightly smaller than for the set CU.U. Similarresults were obtained also for all other metrics.On the contrary, if the metric does not change there is a low probability that a CU is aﬀectedby bugs. These bugs clearly refer to bugs already present in R i but that were found only whenchecking R i +1 release.In order to support our ﬁndings about the deep diﬀerences among CU.U, CU.X and CU.Afamilies, we performed chi-square signiﬁcance tests. We formulate the following null hypothesis:“the subdivision of CU in U, X and A does not signiﬁcantly inﬂuence the number of infected0

4. Conclusion

A statistical description of large software systems as directed graphs can provide muchadditional information on the system features with respect to more traditional approaches,from the software engineering perspective. Adopting a graph as a model for the softwaresystem, we used the compilation units as the basic software module in order to build a softwaregraph, and redeﬁned the CK suite of metrics to cope with CUs. These metrics were then usedto investigate, with a statistical analysis, how and where bugs were introduced into two big,OO software projects like Eclipse and Netbeans. We wrote two diﬀerent parsers to analyze theCVS log ﬁle and the issue tracker repositories in order to automatically associate bugs andCUs. In this paper, we introduced the concept of compilation unit graph, and of OO metricsrelated to compilation units, with the purpose of analyzing software projects managed usinga conﬁguration management system and a corresponding bug tracking system.The picture of the software system as a graph allowed us to detect fat-tail distributions, welldescribed by power-laws, for diﬀerent features of the system, suggesting the same generalunderlying framework of many other complex networks. In particular, we found that bugsdistribution among CUs, number of CUs aﬀected by bugs, metrics distributions (namely LOCs,number of in-links and out-links of the class graph, CK metrics WMC, CBO, RFC and LCOM),all exhibit power-laws fat-tails.Inside this framework it is possible to identify strong correlations among bugs and those metricsrelated to the number of external dependencies which, in the graph representation, are easilydescribed as directed links. All these ﬁndings together indicate a possible strategy to optimizeresources and eﬀorts in software engineering for ﬁnding, forecasting, and ﬁxing software defects.Once the software graph reveals the fat-tail in the relationships between bug and CUs, onemay identify which parts of the software are the most fault-prone and focus ﬁxing eﬀorts onthem. Following [14], if one ranks CUs according to these power-laws, the review of a smallfraction among the highest ranked may have an exponential impact on the overall amount ofsoftware defects detectable and ﬁxable.0

1. Chidamber S. R, Darcy D. P, Kemerer C.F,

Managerial Use of Metrics for Object Oriented Software: AnExploratory Analysis , IEEE Trans. Software Eng., vol 24, No. 8, pp. 629-639, 1998.2. Purushothaman R, Dewayne E.P.

Toward Understanding the Rhetoric of Small Source Code Changes ,IEEE Trans.Software Eng., VOL. 31, NO. 6, JUNE 20053. Kim S., Pan K., Whitehead E.J.Jr.

Micro Pattern Evolution (MSR’06), May 22-23, 2006, Shanghai, China.4. ´Sliwerski J, Zimmermann T, Zeller A.

When do changes induce ﬁxes? . Proc. International Workshop onMining Software Repositories (MSR’05), St. Louis, Missouri, U.S., May 2005.5. Zimmermann T, Nagappan N.

Predicting Defects using Network Analysis on Dependency Graphs ,ICSE’08, May 10-18, 2008,Leipzig, Germany.6. Andersson C, Runeson P.

A Replicated Quantitative Analysis of Fault Distributions in Complex SoftwareSystems , IEEE Trans.Software Eng., VOL. 33, NO. 5, MAY 2007, pp. 273-286.7. Zhang H.

On the Distribution of Software Faults

Analyzing and relating bug report data for feature tracking . In Proc. 10thWorking Conference on Reverse Engineering (WCRE’03), Victoria, British Columbia, Canada, Nov. 2003.IEEE.11. Concas G, Marchesi M, Pinna S, Serra N.

Power-Laws in a Large Object-Oriented Software System . IEEETrans.Software Eng., vol. 33, no. 10. pp. 687-708, 2007.12. Juran J.M, Gryna F.M. Jr.

Quality Control Handbook , fourth ed. McGraw-Hill, 1988.13. Barabasi A, Albert R.

Emergence of Scaling in Random Networks . Science, vol. 286, pp. 509-512, 1999.14. Louridas P, Spinellis D, Vlachos V.

Power Laws in Software , ACM Transactions on Software Engineeringand Methodology, Vol. 18, No.1, September 2008.15. Newman M. E. J.

Power laws, Pareto distributions and Zipf’s law , Contemporary Physics, vol. 46, pp.323-351, 2005.0;0