Investigating and Recommending Co-Changed Entities for JavaScript Programs
IInvestigating and Recommending Co-Changed Entitiesfor JavaScript Programs
Zijian Jiang a , Hao Zhong b , Na Meng a, ∗ a Virginia Polytechnic Institute and State University, Blacksburg VA , USA b Shanghai Jiao Tong University, Shanghai , China
Abstract
JavaScript (JS) is one of the most popular programming languages due to itsflexibility and versatility, but maintaining JS code is tedious and error-prone. Inour research, we conducted an empirical study to characterize the relationshipbetween co-changed software entities (e.g., functions and variables), and built amachine learning (ML)-based approach to recommend additional entity to editgiven developers’ code changes. Specifically, we first crawled 14,747 commits in10 open-source projects; for each commit, we created one or more change depen-dency graphs (CDGs) to model the referencer-referencee relationship betweenco-changed entities. Next, we extracted the common subgraphs between CDGsto locate recurring co-change patterns between entities. Finally, based on thosepatterns, we extracted code features from co-changed entities and trained anML model that recommends entities-to-change given a program commit.According to our empirical investigation, (1) three recurring patterns com-monly exist in all projects; (2) 80%–90% of co-changed function pairs eitherinvoke the same function(s), access the same variable(s), or contain similarstatement(s); (3) our ML-based approach CoRec recommended entity changeswith high accuracy (73%–78%). CoRec complements prior work because it sug-gests changes based on program syntax, textual similarity, as well as softwarehistory; it achieved higher accuracy than two existing tools in our evaluation.
Keywords:
Multi-entity edit, change suggestion, machine learning, JavaScript
1. Introduction
JavaScript (JS) has become one of the most popular programming languagesbecause it is lightweight, flexible, and powerful [1]. Developers use JS to buildweb pages and games. JS has many new traits (1) it is dynamic and weakly ∗ Corresponding author
Email addresses: [email protected] (Zijian Jiang), [email protected] (Hao Zhong), [email protected] (Na Meng) a r X i v : . [ c s . S E ] F e b yped; (2) it has first-class functions; (3) it is a class-free, object-oriented pro-gramming language that uses prototypal inheritance instead of classical inheri-tance; and (4) objects in JS inherit properties from other objects directly andall these inherited properties can be changed at runtime. All above-mentionedtraits make JS unique and powerful; they also make JS programs very challeng-ing to maintain and reason about [2, 3, 4].To reduce the cost of maintaining software, researchers proposed approachesthat recommend code co-changes [5, 6, 7, 8]. For instance, Zimmermann etal. [5] and Rolfsnes et al. [6] mined co-change patterns of program entities fromsoftware version history and suggested co-changes accordingly. Wang et al. [7, 8]studied the co-change patterns of Java program entities and built CMSuggesterto suggest changes accordingly for any given program commit. However, existingtools do not characterize any co-change patterns between JS software entities,neither do they recommend changes by considering the unique language featuresof JS or the mined co-changed patterns from JS programs (see Section 8.3) fordetailed discussions).To overcome the limitations of the prior approaches, in this paper, we firstconducted a study on 14,747 program commits from 10 open-source JS projectsto investigate (1) what software entities are usually edited together, and (2) howthose simultaneously edited entities are related. Based on this characterizationstudy for co-change patterns, we further developed a learning-based approachCoRec to recommend changes given a program commit.Specifically in our study, for any program commit, we constructed and com-pared Abstract Syntax Trees (ASTs) for each edited JS file to identify all editedentities (e.g., Deleted Classes (DC), Changed Functions (CF), and Added Vari-ables (AV)). Next, we created change dependency graphs (CDGs) for each com-mit by treating edited entities as nodes and linking entities that have referencer-referencee relations. Afterwards, we extracted common subgraphs betweenCDGs and regarded those common subgraphs as recurring change patterns.In our study, we explored the following research question: RQ1: What are the frequent co-change patterns in JS programs?
We automatically analyzed thousands of program commits from ten JS projectsand revealed the recurring co-change patterns in each project. By manuallyinspecting 20 commits sampled for each of the 3 most popular patterns, weobserved that 80%–90% of co-changed function pairs either invoke the samefunction(s), access the same variable(s), contain similar statement(s), or getfrequently co-changed in version history.Besides the above findings, our study reveals three most popular change pat-terns: (i) one or more caller functions are changed together with one changedcallee function that they commonly invoke; (ii) one or more functions arechanged together to commonly invoke an added function; (iii) one or morefunctions are changed together to commonly access an added variable. Theco-changed callers in each pattern may share commonality in terms of variableaccesses, function invocations, code similarity, or evolution history.2ased on the above-mentioned observations, we built a machine learning(ML)-based approach—CoRec—to recommend functions for co-change. Giventhe commits that contain matches for any of the above-mentioned co-changepatterns, CoRec extracts 10 program features to characterize the co-changedfunction pairs, and uses those features to train an ML model. Afterwards, givena new program commit, the model predicts whether any unchanged functionshould be changed as well and recommends changes whenever possible. WithCoRec, we investigated the following research question:
RQ2: How does CoRec perform when suggesting co-changes basedon the observed three most popular patterns?
We applied CoRec and two existing techniques (i.e., ROSE [5] and TransitiveAssociate Rules (TAR) [9]) to the same evaluation datasets, and observed CoRecto outperform both techniques by correctly suggesting many more changes.CoRec’s effectiveness varies significantly with the ML algorithm it adopts. CoRecworks better when it trains three separate ML models corresponding to the threepatterns than training a unified ML model for all patterns. Our results showthat CoRec can recommend co-change functions with 73–78% accuracy; it sig-nificantly outperforms two baseline techniques that suggest co-changes purelybased on software evolution.We envision CoRec to be used in the integrated development environments(IDE) for JS, code review systems, and version control systems. In this way,after developers make code changes or before they commit edits to softwarerepositories, CoRec can help detect and fix incorrectly applied multi-entity ed-its. In the sections below, we will first describe a motivating example (Sec-tion 2), and then introduce the concepts used in our research (Section 3). Next,we will present the empirical study to characterize co-changes in JS programs(Section 4). Afterwards, we will explain our change recommendation approachCoRec (Section 5) and expound on the evaluation results (Section 6).
2. A Motivating Example
The prior work [10, 11, 12, 13] shows that developers may commit errorsof omission (i.e., forgetting to apply edits completely) when they have to editmultiple program locations simultaneously in one maintenance task (i.e., bugfixing, code improvement, or feature addition). For instance, Fry et al. [10]reported that developers are over five times more precise at locating errorsof commission than errors of omission. Yin et al. [12] and Park et al. [13]separately showed that developers introduced new bugs when applying patchesto fix existing bugs. In particular, Park et al. inspected the supplementary bugfixes following the initial bug-fixing trials, and summarized nine major reasonsto explain why the initial fixes were incorrect. Two of the nine reasons wereabout the incomplete program edits applied by developers.To help developers apply JS edits completely and avoid errors of omission, wedesigned and implemented a novel change recommendation approach—CoRec.3his section overviews our approach with a running example, which is extractedfrom a program commit to Node.js—an open-source server-side JS runtime envi-ronment [14]. Figure 1 shows a simplified version of the exemplar program com-mit [15]. In this revision, developers added a function maybeCallback(...) to checkwhether the pass-in parameter cb is a function, and modified seven functions indistinct ways to invoke the added function(e.g., changing fs.write(...) on line10 and line 14). The seven functions include: fs.rmdir(...) , fs.appendFile(...) , fs.truncate(...) , fs.write(...) , fs.readFile(...) , fs.writeFile(...) , and fs.writeAll(...) [15]. However, developers forgot to change an eighth function— fs.read(...) —to also invoke the added function (see line 19 in Figure 1).
1. + function maybeCallback(cb) {2. + return typeof cb === 'function' ? cb : rethrow();3. + } 4. fs.write = function (fd, buffer, offset, length, position, callback) {5. - callback = makeCallback(arguments[arguments.length - 1]);6. …7. req.oncomplete = wrapper;8. if (buffer instanceof Buffer) {9. …10.+ callback = maybeCallback(callback);11. return binding.writeBuffer(…);12. }13. …14.+ callback = maybeCallback(position);15. return binding.writeBuffer(fd, buffer, offset, …);16. } 17. fs.read = function (fd, buffer, offset, length, position, callback) {18.- callback = makeCallback(arguments[arguments.length - 1]);19. … // an edit that developers forgot to apply://+ callback = maybeCallback(callback);
20. req.oncomplete = wrapper;21. binding.read(fd, buffer, offset, …);22. }
Figure 1: A program commit should add one function and change eight functions to invokethe newly added one. However, developers forgot to change one of the eight functions— fs.read(...) [15].CoRec reveals the missing change with the following steps. CoRec first trainsan ML model with the program co-changes extracted from Node.js softwareversion history. Then given the exemplar commit, based on the added function maybeCallback(...) and each changed function (e.g., fs.write(...) ), CoRec ex-tracts any commonality between the changed function and any unchanged one.For each function pair, CoRec applies its ML model to the extracted commonal-ity features and predicts whether the function pair should be changed together.Because fs.write(...) and fs.read(...) • commonly access one variable binding , • commonly invoke two functions: makeCallback(...) and wrapper(...) , • declare the same parameters in sequence, • have token-level similarity as 41%, and • have statement-level similarity as 42%,4 onst Rectangle = class {constructor(height, width) { this .height = height; this .width = width;}area() { return this .height * this .width;}};console.log( new
Rectangle(5, 8).area()); class
Rectangle{constructor(height, width) { this .area = height * width;}}console.log( new
Rectangle(5, 8).area); (a) (b)
Figure 2: A JS class can be defined with an expression (see (a)) or a declaration (see (b)). the pre-trained ML model inside CoRec considers the two functions to share suf-ficient commonality and thus recommends developers to also change fs.read(...) to invoke maybeCallback(...) . In this way, CoRec can suggest entities for change,which edits developers may otherwise miss.
3. Terms and Definitions
This section first introduces concepts relevant to JS programming, and thendescribes the terminology used in our research.
ES6 and ES5.
ECMA Script is the standardized name for JavaScript [16].ES6 (or ECMAScript2015) is a major enhancement to ES5, and adds manyfeatures intended to make large-scale software development easier. ES5 is fullysupported in all modern browsers, and major web browsers support some fea-tures of ES6. Our research is applicable to both ES5 and ES6 programs.
Software Entity.
We use software entity to refer to any defined JS class , function , variable , or any independent statement block that is not con-tained by the definition of classes, functions, or variables. When developerswrite JS code, they can define each type of entities in multiple alternative ways.For instance, a class can be defined with a class expression (see Figure 2 (a)) orclass declaration (see Figure 2 (b)). Similarly, a function can be defined witha function expression or function declaration. A variable can be defined with avariable declaration statement; the statement can either use keyword const todeclare a constant variable, or use let or var to declare a non-constant variable. Edited Entity.
When maintaining JS software, developers may add, delete,or change one or more entities. Therefore, as with prior work [17], we defineda set of edited entities to describe the possible entity-level edits, including
Added Class ( AC ), Deleted Class ( DC ), Added Function ( AF ), Deleted Func-tion ( DF ), Changed Function ( CF ), Added Variable ( AV ), Deleted Variable ( DV ), Changed Variable ( CV ), Added Statement Block ( AB ), Deleted State-ment Block ( DB ), and Changed Statement Block ( CB ). For example, if a newclass is declared to have a constructor and some other methods, we consider therevision to have one AC, multiple AFs, and one or more AV (depending on howmany fields are defined in the constructor).5 . Program Differencingfor each ( ! " , ! ) • AC, AV, AF, AB • DC, DV, DF, DB • CV, CF, CB 1. AST Parsing ( $%& ' , $%& ( ) (f1_old, f1_new)(f2_old, f2_new)……(fn_old, fn_new) Commit c
2. Entity Extraction( $%& ' , $%& ( ) (ES o , ES n ) Figure 3: The procedure to extract changed entities given a commit.
Multi-Entity Edit and CDG.
As with prior work [18], we use multi-entity edit to refer to any commit that has two or more edited entities . Weuse change dependency graph (CDG) to visualize the the relationship be-tween co-changed entities in a commit. Specifically, each CDG has at least twonodes and one edge. Each node represents an edited entity, and each edge rep-resents the referencer-referencee relationship between entities (e.g., a functioncalls another function). Namely, if an edited entity E refers to another editedentity E , we say E depends on E . A related CDG is constructed to connectthe two entities with a directed edge pointing to E —the entity being dependedupon (i.e. E → E ). For each program commit, we may create zero, one, ormultiple CDGs.
4. Characterization Study
This section introduces our study methodology (Section 4.1) and explainsthe empirical findings (Section 4.2). The purpose of this characterization studyis to identify recurring change pattern (RCP) of JS programs. An RCP is aCDG subgraph that is commonly shared by the CDGs from at least two distinctcommits. RCPs define different types of edits, and serve as the templates of co-change rules. Our approach in Section 5 mines concrete co-change rules for themost common RCPs.
We implemented a tool to automate the analysis. Given a set of programcommits in JS repositories, our tool first characterizes each commit by extract-ing the edited entities (Section 4.1.1) and constructing CDG(s) (Section 4.1.2).Next, it compares CDGs across commits to identify RCPs (Section 4.1.3).
As shown in Figure 3, we took three steps to extract any edited entities foreach commit.
Step 1: AST Parsing . Given a program commit c , this step first locates theold and new versions of each edited JS file. For every edited file ( f o , f n ), thisstep adopts Esprima [19] and typed-ast-util [20] to generate Abstract SyntaxTrees (ASTs): ( ast o , ast n ). Esprima is a high performance, standard-compliantJavaScript parser that supports the syntax of both ES5 and ES6; however, it6 igure 4: Extracting edited entities from a program commit of Meteor [21]. cannot infer the static type binding information of any referenced class, function,or variable. Meanwhile, given JS files and the project’s package.json file, typed-ast-util produces ASTs annotated with structured representations of TypeScripttypes, which information can facilitate us to precisely identify the referencer-referencee relationship between edited entities. We decided to use both tools fortwo reasons. First, when a project has package.json file, we rely on Esprima toidentify the code range and token information for each parsed AST node, andrely on typed-ast-util to attach relevant type information to those nodes. Sec-ond, if a project has no package.json file, Esprima is still used to generate ASTsbut we defined a heuristic approach (to be discussed later in Section 4.1.2) toidentify the referencer-referencee relationship between entities with best efforts.To facilitate our discussion, we introduce a working example from a pro-gram revision [21] of Meteor [22]. As shown in Figure 4, the program revi-sion changes seven JS files. In this step, CoRec creates a pair of ASTs foreach edited file and stores the ASTs into JSON files for later processing (e.g., tools/buildmessages-ast.json (old) and tools/buildmessages-ast.json (new) ). Step 2: Entity Extraction . From each pair of ASTs ( ast o , ast n ) (i.e., JSONfiles), this step extracts the entity sets ( ES o , ES n ). In the example shown inFigure 4, ES o lists all entities from the old JS file, and ES n corresponds to thenew file. We defined four kinds of entities to extract: variables (V), functions(F), classes (C), and statement blocks (B). A major technical challenge hereis how to extract entities precisely and consistently . Because JS programmingsupports diverse ways of defining entities and the JS syntax is very flexible,we cannot simply check AST node types of statements to recognize entity def-initions. For instance, a variable declaration statement can be interpreted asa variable-typed entity or a statement block, depending on the program con-text. To eliminate ambiguity and avoid any confusion between differently typedentities, we classify and extract entities in the following way: • A code block is treated as a function definition if it satisfies either of the fol-lowing two requirements. First, the AST node type is “
FunctionDeclaration ”(e.g., runBenchmarks() on line 7 in Figure 5) or “
MethodDefinition ”. Sec-ond, (1) the block is either a “
VariableDeclaration ” statement (e.g., const etRectArea = function(...) { ... } ; ) or an“ Assignment ” expression (see line11 and line 20 of Figure 5); and (2) the right-side operand is either“
FunctionExpression ”, or “
CallExpression ” that outputs another function asreturn value of the called function. In particular, if any defined functionhas its prototype property explicitly referenced (e.g.,
Benchmark.prototype on lines 20 and 24) or is used as a constructor to create any object (e.g.,line 12), we reclassify the function definition as a class definition, becausethe function usage is more like the usage of a class. • A code block is considered to be a class definition if it meets either of thefollowing two criteria. First, the block uses keyword class . Second, theblock defines a function, while the codebase either references the function’sprototype (e.g.,
Benchmark.prototype on lines 20 and 24 in Figure 5) or usesthe function as a constructor to create any object (see line 12). • A code block is treated as a variable declaration if (1) it is either a“
VariableDeclaration ” statement (e.g., var silent = ... on line 2 in Fig-ure 5) or an “
Assignment ” expression, (2) it does not define a function orclass, (3) it does not belong to the definition of any function but maybelong to a constructor (see lines 15-17), and (4) it does not declare arequired module (see line 1). Particularly, when a variable declaration isan assignment inside a class constructor (e.g., lines 15-17), it is similar tothe field declaration in Java. • A code block is treated as a statement block if (1) it purely containsstatements, (2) it does not define any class, function, or variable, and (3)it does not belong to the definition of any class or function. For example,lines 3-6 in Figure 5 are classified as a statement block.
Step 3: Program Differencing . To identify any edited entity between ES o and ES n , we first matched the definitions of functions, variables, and classesacross entity sets based on their signatures. If any of these entities (e.g., afunction definition) solely exists in ES o , an entity-level deletion (e.g., DF) isinferred; if an entity (e.g., a variable definition) solely exists in ES n , an entity-level insertion (e.g., AV) is inferred. Next, for each pair of matched entities,we further exploited a fine-grained AST differencing tool—GumTree [23]—toidentify expression-level and statement-level edits. If any edit is reported, weinferred an entity-level change (e.g., CF shown in Figure 4). Additionally, wematched statement blocks across entity sets based on their string similarities.Namely, if a statement block b ∈ ES o has the longest common subsequencewith a block b ∈ ES n and the string similarity is above 50%, we consideredthe two blocks to match. Furthermore, if the similarity between two matchedblocks is not 100%, we inferred a block-level change CB. For each program commit, we built CDGs by representing the edited entitiesas nodes, and by connecting edited entities with directed edges if they have either8 /A
1. var assert = require(‘assert’);
2. var silent = +process.env.NODE_BENCH_SILENT;
3. if (module === require.main) {4. …5. runBenchmarks();6. }
7. function runBenchmarks() {8. var test = test.shift();9. …10.} .creatBenchmark = function (fn, options) {12. return new
Benchmark(fn, options);13.}
Benchmark(fn, options) {15. this .fn = fn;16. this .options = options;17. this .config = parseOpts(options);18. …19.}20.Benchmark. prototype .report = function (value) {21. var heading = this .getHeading();22. …23.};24.Benchmark. prototype .getHeading = function () {25. var conf = this .config;26. …27.} variablestatement blockfunctionfunction classfunctionfunctionvariablevariablevariable
Figure 5: Code snippets from the file benchmark.common.js of Node.js in revision 00a1d36 [15],whose related entity types are shown on the right. of the following two kinds of relationship: • Access.
If an entity E accesses another entity E (i.e., by reading/writinga variable, invoking a function, or using a class), we consider E to be de-pendent on E . • Containment.
If the code region of E is fully covered by that of E , weconsider E to be dependent on E .The technical challenge here is how to identify the relationship betweenedited entities. We relied on ESprima’s outputs to compare code regions betweenedited entities in order to reveal the containment relations. Additionally, when package.json file is available, we leveraged the type binding information inferredby typed-ast-util to identify the access relationship. For instance, if there is afunction call bar() inside an entity E while bar() is defined by a JS module f2 ,then typed-ast-util can resolve the fully qualified name of the callee function as f2.bar() . Such resolution enables us to effectively link edited entities no matterwhether they are defined in the same module (i.e., JS file) or not.Since some JS projects have no package.json file, we could not adopt typed-ast-util to resolve bindings in such scenarios. Therefore, we also built a simplerbut more applicable approach to automatically speculate the type binding in-formation of accessed entities as much as possible. Specifically, suppose that file f1 defines E to access E . To resolve E and link E with E ’s definition, thisintuitive approach first scans all entities defined in f1 to see whether there isany E definition locally. If not, this approach further examines all require and import statements in f1 , and checks whether any required or imported module9 . + var spaces = function (n) {2. + return _.times(n, function () { return ' ' }).join(’’);3. + };
4. var capture = function (option, f) {5. …6. - console.log("START CAPTURE", nestingLevel, options.title, "took " + (end - start));7. + console.log(spaces(nestingLevel * 2), "START CAPTURE", nestingLevel, options.title, "took " + (end - start));8. … 9. } 1. main.registerCommand({…}, function (options) {2. …3. - var messages = buildmessage.capture( function () {4. + var messages = buildmessage.capture({ title: 'Combining constraints' }, function (){5. allPackages = project.getCurrentCombinedConstraints();6. });7. … tools/buildmessage.js tools/commands-packages.js
Figure 6: A simplified program commit that adds a function spaces(...) , changes a function capture(...) , and changes a statement block [21] CB tools.commands-packages. statement_block_69933 Function Invocation CF tools.buildmessage. capture(options, f) AF tools.buildmessage. spaces(n) Function Invocation
Figure 7: The CDG corresponding to the program commit shown in Figure 6 defines a corresponding entity with E ’s name; if so, this approach links E withthe retrieved E ’s definition.Compared with typed-ast-util, our approach is less precise because it cannotinfer the return type of any invoked function. For instance, if we have const foo= bar() where bar() returns a function, our approach simply assumes foo to be avariable instead of a function. Consequently, our approach is unable to link foo ’sdefinition with any of its invocations. Based on our experience of applying bothtyped-ast-util and the heuristic method to the same codebase (i.e., nine open-source projects), the differences between these two methods’ results account forno more than 5% of all edited entities. It means that our heuristic method isstill very precise even though no package.json file is available.Figures 6 and 7 separately present the code changes and CDG related to tools/buildmessage.js , an edited file mentioned in Figure 4. According to Fig-ure 6, the program commit modifies file tools/buildmessage.js by defining anew function spaces(...) and updating an existing function capture(...) toinvoke the new function. It also changes file tools/commands-package.js by up-dating the function invocation of capture(...) inside a statement block (i.e., main.registerCommand(...); ). Given the old and new versions of both edited JSfiles, our approach can construct the CDG shown in Figure 7. In this CDG,each directed edge starts from a dependent entity E , and points to the entityon which E depends. Each involved function, variable, or class has its fullyqualified name included in the CDG for clarity. As statement blocks have nofully qualified names, we created a unique identifier for each block with (1) themodule name (e.g., tools.commands-packages ) and (2) index of the block’s firstcharacter in that module (e.g., ). 10 able 1: Subject projects Project Description
Node.js Node.js [14] is a cross-platform JSruntime environment. It executes JScode outside of a browser. 1,755 2,701 11,287Meteor Meteor [22] is an ultra-simpleenvironment for building modern webapplications. 255 3,011 10,274Ghost Ghost [25] is the most popularopen-source and headless Node.js contentmanagement system (CMS) forprofessional publishing. 115 1,263 5,142Habitica Habitica [26] is a habit building programthat treats people’s life like a RolePlaying Game. 129 1,858 6,116PDF.js PDF.js [27] is a PDF viewer that is builtwith HTML5. 104 1,754 4,255React React [28] is a JS library for buildinguser interfaces. 286 1,050 4,415Serverless Serverless [29] is a framework used tobuild applications comprised ofmicroservices that run in response toevents. 63 1,110 3,846Webpack Webpack [30] is a module bundler, whichmainly bundles JS files for usage in abrowser. assets. 37 1,099 3,699Storybook Storybook [31] is a developmentenvironment for UI components. 43 528 2,277Electron Electron [32] is a framework thatsupports developers to writecross-platform desktop applications usingJS, HTML, and CSS. 35 673 1,898
As with prior work [18], we extracted RCPs by comparing CDGs acrossprogram commits. Intuitively, given a CDG g from commit c and the CDG g from commit c , we matched nodes based on their edit-entity labels (e.g.,AF) while ignoring the concrete code details (e.g., tools.buildmessage.spaces(n) in Figure 7). We then established edge matches based on those node matches.Namely, two edges are matched only if they have matching starting nodes andmatching ending nodes. Next, based on all established matches, we identifiedthe largest common subgraph between g and g using the off-the-shelf subgraphisomorphism algorithm VF2 [24]. Such largest common subgraphs are consid-ered as RCPs because they commonly exist in CDGs of different commits. To characterize JS code changes, we applied our study approach to a subsetof available commits in 10 open-source projects, as shown in Table 1. We chosethese projects because (1) they are popularly used; (2) they are from different11 igure 8: Commit distributions based on the number of edited entities each of them contains application domains; and (3) they contain a large number of available commits.For simplicity, to sample the commits that may fulfill independent maintenancetasks, we searched each software repository for commits whose messages containany of the following keywords: “bug”, “fix”, “error”, “adjust”, and “failure”.Table 1 shows the statistics related to the sampled commits. In particular,column presents the code size of each project (i.e., the numberof kilo lines of code (KLOC)). Column reports the numberof commits identified via our keyword-based search. Column reports the number of edited entities extracted from those sampledcommits. According to this table, the code size of projects varies significantlyfrom 35 KLOC to 1755 KLOC. Among the 10 projects, 528–3,011 commitswere sampled, and 1,898–11,287 edited entities were included for each project.Within these projects, only Node.js has no package.json file, so we adopted ourheuristic approach mentioned in Section 4.1.2 to link edited entities. For theremaining nine projects, as they all have package.json files, we leveraged the typebinding information inferred by typed-util-ast to connect edited entities.
We first clustered commits based on the number of edited entities they con-tain. Because the commit distributions of different projects are very similar toeach other, we present the distributions for four projects in Figure 8. Among the10 projects, 41%–52% of commits are multi-entity edits. Specifically, 15%–19%of commits involve two-entity edits, and 7%–10% of commits are three-entity12 able 2: Multi-entity edits and created CDGs
Project
Node.js 1,401 785 56%Metoer 1,445 670 46%Ghost 604 356 59%Habitica 962 345 36%PDF.js 711 372 52%React 538 320 60%Serverless 480 171 36%Webpack 483 253 52%Storybook 243 119 49%Electron 277 123 44% edits. The number of commits decreases as the number of edited entities in-creases. The maximum number of edited entities appears in Node.js, wherea single commit modifies 335 entities. We manually checked the commit onGitHub [33], and found that four JS files were added and three other JS fileswere changed to implement HTTP/2.We noticed that about half of sampled program revisions involve multi-entityedits. This observation implies the importance of co-change recommendationtools. When developers have to frequently edit multiple entities simultaneouslyto achieve a single maintenance goal, it is crucially important to provide au-tomatic tool support that can check for the completeness of code changes andsuggest any missing change when possible. In order to build such tools, we de-cided to further explore relations between co-changed entities (see Section 4.2.2).
Finding 1:
Among the 10 studied projects, 41–52% of studied commitsare multi-entity edits. It indicates the necessity of our research to char-acterize multi-entity edits and to recommend changes accordingly.4.2.2. Commit Distributions Based on The Number of CDGs
We further clustered multi-entity edits based on the number of CDGs con-structed for each commit. As shown in Table 2, our approach created CDGs for36–60% of the multi-entity edits in distinct projects. On average, 49% of multi-entity edits contain at least one CDG. Due to the complexity and flexibility ofthe JS programming language, it is very challenging to statically infer all possi-ble referencer-referencee relationship between JS entities. Therefore, the actualpercentage of edits that contain related co-changed entities can be even higherthan our measurement. Figure 9 presents the distributions of multi-entity editsbased on the number of CDGs extracted. Although this figure only presents thecommit distributions for four projects: Node.js, Meteor, Ghost, and Habitica,13 % of commits with CDG(s)
Node.js Ghost
Meteor % of commits with CDG(s)
Habitica % of commits with CDG(s)
Figure 9: The distributions of multi-entity edits based on the number of CDGs we observed similar distributions in other projects as well. As shown in thisfigure, the number of commits decreases significantly as the number of CDGsincreases. Among all 10 projects, 73%–81% of commits contain single CDGs,9%–18% of commits have two CDGs extracted, and 3%–7% of commits havethree CDGs each. The commit with the largest number of CDGs constructed(i.e., 16) is the one with the maximum number of edited entities in Node.js asmentioned above in Section 4.2.1.The high percentage of multi-entity edits with CDGs extracted (i.e., 49%)implies that JS programmers usually change syntactically related entities simul-taneously in program revisions. Such syntactic relevance between co-changedentities enlightened us to build tools that recommend co-changes by observingthe syntactic dependences between changed and unchanged program entities.To concretize our approach design for co-change recommendations, we furtherexplored the recurring syntactic relevance patterns between co-changed entities,i.e., RCPs (see Section 4.2.3).
Finding 2:
For 36–60% of multi-entity edits in the studied projects, ourapproach created at least one CDG for each commit. It means that manysimultaneously edited entities are syntactically relevant to each other.4.2.3. Identified RCPs
By comparing CDGs of distinct commits within the same project repository,we identified RCPs in all projects. As listed in Table 3, 35–221 RCPs are ex-tracted from individual projects. In each project, there are 113–782 commits14 able 3: Recurring change patterns and their matches
Projects
Node.js 221 782 2,385Metoer 200 658 1,719Ghost 133 351 1,223Habitica 116 339 706PDF.js 86 367 640React 110 316 899Serverless 57 164 372Webpack 80 243 583Storybook 42 113 337Electron 35 117 228 that contain matches for RCPs. In particular, each project has 228–2,385 sub-graphs matching RCPs. By comparing this table with Table 2, we found that95%–100% of the commits with CDGs extracted have matches for RCPs. Itmeans that if one or more CDGs can be built for a commit, the commit is verylikely to share common subgraphs with some other commits. In other words,simultaneously edited entities are usually correlated with each other in a fixednumber of ways. If we can characterize the frequently occurring relationshipbetween co-changed entities, we may be able to leverage such characterizationto predict co-changes or reveal missing changes.By accumulating the subgraph matches for RCPs across projects, we iden-tified five most popular RCPs, as shown in Figure 10. Here, P1 means thatwhen a callee function is changed, one or more of its caller functions are alsochanged. P2 means that when a new function is added, one or more existingfunctions are changed to invoke that new function. P3 shows that when a newvariable is added, one or more existing functions are changed to read/write thenew variable. P4 presents that when a new variable is added, one or more newfunctions are added to read/write the new variable. P5 implies that when afunction is changed, one or more existing statement blocks invoking the functionare also changed. Interestingly, the top three patterns commonly exist in all 10projects, while the other two patterns do not exist in some of the projects. Thetop three patterns all involve simultaneously changed functions. Finding 3:
Among the commits with CDGs extracted, 95%–100% ofcommits have matches for mined RCPs. In particular, the most popularthree RCPs all involve simultaneously changed functions.4.2.4. Case Studies for The Three Most Popular RCPs
We conducted two sets of case studies to understand (1) the semantic mean-ings of P1–P3 and (2) any co-change indicators within code for those patterns.15 attern IndexPattern Shape P1 P2 P3 P4 P5
Function Invocation Variable Read/Write*CF: One or more changed functions *CB: One or more changed statement blocksf v CB Figure 10: 5 most popular recurring change patterns among the 10 projects
In each set of case studies, we randomly sampled 20 commits matching each ofthese RCPs and manually analyzed the code changes in all 60 commits.
The Semantic Meanings of P1–P3 . In the 20 commits sampled for eachpattern, we summarized the semantic relevance of entity-level changes as below.
Observations for P1 (*CF f −→ CF) . We found the caller and callee func-tions changed together in three typical scenarios. First, in about 45% of the in-spected commits, both caller and callee functions experienced consistent changes to invoke the same function(s), access the same variable(s), or execute the samestatement(s). Second, in about 40% of the commits, developers applied adap-tive changes to callers when callees were modified. The adaptive changes involvemodifying caller implementations when the signatures of callee functions wereupdated, or moving code from callees to callers. Third, in 15% of cases, callersand callees experienced seemingly irrelevant changes.
Observations for P2 (*CF f −→ AF).
Such changes were applied for twomajor reasons. First, in 65% of the studied commits, the added function im-plemented some new logic, which was needed by the changed caller function.Second, in the other 35% of cases, changes were applied for refactoring purposes.Namely, the added function was extracted from one or more existing functionsand those functions were simplified to just invoke the added function.
Observations for P3 (*CF v −→ AV).
Developers applied such changes in twotypical scenarios. First, in 60% of cases, developers added a new variable forfeature addition, which variable was needed by each changed function (i.e., cross-cutting concern [34]). Second, in 40% of the cases, developers added variablesfor refactoring purposes. For instance, some developers added a variable toreplace a whole function, so all caller functions of the replaced function wereconsistently updated to instead access the new variable. Additionally, someother developers added a variable to replace some expression(s), constant(s), orvariable(s). Consequently, the functions related to the replaced entities wereconsistently updated for the new variable.
The Code Indicators for Co-Changes in P1–P3 . To identify potentialways of recommending changes based on the mined RCPs, we randomly picked20 commits matching each pattern among P1–P3; we ensured that each sampledcommit has two or more co-changed functions (e.g., *CF) referencing another16dited entity. We then inspected the co-changed functions in each commit, todecide whether they share any commonality that may indicate their simultane-ous changes. As shown in Table 4, the three case studies I–III correspond to thethree patterns P1–P3 in sequence. In our manual analysis, we mainly focusedon four types of commonality: • FI : The co-changed functions commonly invoke one or more peer functions of the depended entity E (i.e., CF in P1, AF in P2, and AV in P3). Here, peer function is any function that is defined in the same file as E . • VA : The co-changed functions commonly access one or more peer variables of the depended entity E . Here, peer variable is any variable that isdefined in the same file as E . • ST : The co-changed functions commonly share at least 50% of their to-ken sequences. We calculated the percentage with the longest commonsubsequence algorithm between two token strings. • SS : The co-changed functions commonly share at least 50% of their state-ments. We computed the percentage by recognizing identical statementsbetween two given functions f1(...) and f2(...) . Assume that the twofunctions separately contain n and n statements, and the number ofcommon statements is n . Then the percentage is calculated as n × n + n × Table 4: Commonality observed between the co-changed functions
Case Study Commonality NoFI VA ST SS Commonality
I (for P1: *CF f −→ CF) 8 5 7 4 4II (for P2: *CF f −→ AF) 12 7 8 6 2III (for P3: *CF v −→ AV) 6 13 6 5 3
According to Table 4, 80%–90% of co-changed functions share certain com-monality with each other. There are only 2–4 commits in each study where theco-changed functions share nothing in common. Particularly, in the first casestudy, the FI commonality exists in eight commits, VA exists in five commits,ST occurs in seven commits, and SS occurs in four commits. The summation ofthese commonality occurrences is larger than 20, because the co-changed func-tions in some commits share more than one type of commonality. Additionally,the occurrence rates of the four types of commonality are different between casestudies. For instance, FI has 8 occurrences in the first case study; it occurs in12 commits of the second study and occurs in only 6 commits of the third study.As another example, most commits (i.e., 13) in the third study share the VAcommonality, while there are only 5 commits in the first study having such com-monality. The observed differences between our case studies imply that when17 oftware Repository of Program P
Phase I: Commit Crawling
Extraction of Edited Entities CDG Construction RCP Matching
Commits matching P1, P2, or P3
Matches for *CF Unchanged functions Matches for E (CF, AF, or AV)
Phase II: Training
Single-Function Features (2) Commonality Features between Functions (7) Co-Evolution Feature between Functions (1)
Feature Extraction Machine Learning Three Alternative Classifiers
A New Commit of P
Function Pair Preparation
Phase III: Testing
Any Function to Co-Change
Figure 11: CoRec consists of three phases: commit crawling, training, and testing developers apply multi-entity edits matching different RCPs, the commonalityshared between co-changed functions also varies.
Finding 4:
When inspecting the relationship between co-changed func-tions in three case studies, we found that these functions usually sharecertain commonality. This indicates great opportunities for developingco-change recommendation approaches.
5. Our Change Recommendation Approach: CoRec
In our characterization study (see Section 4), we identified three most pop-ular RCPs: *CF f −→ CF, *CF f −→ AF, and *CF v −→ AV. In all these patterns, thereis at least one or more changed functions (i.e., *CF) that references anotheredited entity E (i.e., CF, AF, or AV). In the scenarios when two or moreco-changed functions commonly depend on E , we also observed certain com-monality between those functions. This section introduces our recommendationsystem—CoRec—which is developed based on the above-mentioned insights.As shown in Figure 11, CoRec has three phases. In the following subsections(Sections 5.1-5.3), we explain each phase in detail. Given the repository of a project P , Phase I crawls commits to locate anydata usable for machine learning. Specifically, for each commit in the repository,this phase reuses part of our study approach (see Sections 4.1.1 and 4.1.2) toextract edited entities and to create CDGs. If a commit c has any subgraphmatching P1, P2, or P3, this phase recognizes the entity E m matching E (i.e.,an entity matching CF in P1, matching AF in P2, or matching AV in P3) andany co-changed function matching *CF. We denote these co-changed function(s)18 able 5: A list of features extracted for function pair ( f , f ) Id Feature Id Feature E m -relevant parametertypes in f f and f have the samereturn type2 Whether f has the E m -related type 7 Whether f and f are defined in thesame way3 Number of common peer variables 8 Token similarity4 Number of common peer functions 9 Statement similarity5 Number of common parameter types 10 Co-evolution frequency with CF Set = { cf , cf , . . . } , and denote the unchanged function(s) in editedJS files from the same commit with U F Set = { uf , uf , . . . } . If CF Set hasat least two co-changed functions, CoRec considers the commit to be usable formodel training and passes E m , CF Set , and
U F Set to the next phase.
This phase has two major inputs: the software repository of program P,and the extracted data from each relevant commit (i.e., E m , CF Set , and
U F Set ). In this phase, CoRec first creates positive and negative trainingsamples, and then extracts features for each sample. Next, CoRec trains a ma-chine learning model by applying Adaboost (with Random Forest as the “weaklearner”) [35] to the extracted features. Specifically, to create positive samples,CoRec enumerates all possible function pairs in
CF Set , because each pair ofthese functions were co-changed with E m . We represent the positive sampleswith P os = { ( cf , cf ) , ( cf , cf ) , ( cf , cf ) , . . . } . To create negative samples,CoRec pairs up each changed function cf ∈ CF Set with an unchanged function uf ∈ U F Set , because each of such function pairs were not co-changed. Thus,we denote the negative samples as
N eg = { ( cf , uf ) , ( uf , cf ) , ( cf , uf ) , . . . } .By preparing positive and negative samples in this way, given certain pair offunctions, we expect the trained model to predict whether the functions shouldbe co-changed or not.CoRec extracts 10 features for each sample. As illustrated in Figure 11, twofeatures reflect code characteristics of the second function in the pair, sevenfeatures capture the code commonality between functions, and one feature fo-cuses on the co-evolution relationship between functions. Table 5 presents moredetails of each feature. Specifically, the 1 st and 2 nd features are about therelationship between f and E m . Their values are calculated as below: • When E m is CF or AF, the 1 st feature records the number of types used in f that match any declared parameter type of E m . Intuitively, the moretype matches, the more likely that f should be co-changed with E m . The2 nd feature checks whether the code in f uses the return type of E m . • When E m is AV, the 1 st feature is set to zero, because there is no param-eter type involved in variable declaration. The 2 nd feature checks whetherthe code in f uses the data type of the newly added variable.19he 3 rd and 4 th features were calculated in similar ways. Specifically, de-pending on which JS file defines E m , CoRec locates peer variables (i.e., variablesdefined within the same file as E m ) and peer functions (i.e., functions defined inthe same file). Next, CoRec identifies the accessed peer variables (or peer func-tions) by each function in the pair, and intersects the sets of both functions tocount the commonly accessed peer variables (or peer functions). Additionally,the 7 th feature checks whether f and f are defined in the same manner. Inour research, we consider the following five ways to define functions:(1) via FunctionDeclaration , e.g., function foo(...) { ... } ,(2) via VariableDeclaration , e.g., var foo = function(...) { ... } ,(3) via MethodDefinition , e.g.,
Class A { foo(...) { ... }} ,(4) via PrototypeFunction to extend the prototype of an object or a function,e.g., x.prototype.foo = function(...) { ... } , and(5) via certain exports -related statements, e.g., exports.foo = function(...) { ... } and module.exports = { foo: function(...) { ... }} .If f and f are defined in the same way, the 7 th feature is set to true . Finally, the10 th feature assesses in the commit history, how many times the pair of functionswere changed together before the current commit. Inspired by prior work [5],we believe that the more often two functions were co-changed in history, themore likely that they are co-changed in the current or future commits.Depending on the type of E m , CoRec takes in extracted features to actuallytrain three independent classifiers, with each classifier corresponding to onepattern among P1–P3. For instance, one classifier corresponds to P1: *CF f −→ CF.Namely, when E m is CF and one of its caller functions cf is also changed,this classifier predicts whether there is any unchanged function uf that invokes E m and should be also changed. The other two classifiers separately predictfunctions for co-change based on P2 and P3. We consider these three binary-class classifiers as an integrated machine learning model, because all of them cantake in features from one program commit and related software version history,in order to recommend co-changed functions when possible. This phase takes in two inputs—a new program commit c n and the re-lated software version history, and recommends any unchanged function thatshould have been changed by that commit. Specifically, given c n , CoRec reusesthe steps of Phase I (see Section 5.1) to locate E m , CF Set , and
U F Set .CoRec then pairs up every changed function cf ∈ CF Set with every un-changed one uf ∈ U F Set , obtaining a set of candidate function pairs
Candi = { ( cf , uf ) , ( uf , cf ) , ( cf , uf ) , . . . } . Next, CoRec extracts features for eachcandidate p and sends the features to a pre-trained classifier depending on E m ’stype. If the classifier predicts the function pair to have co-change relationship,CoRec recommends developers to also modify the unchanged function in p .20 able 6: Numbers of commits that are potentially usable for model training and testing Project
Node.js 92 77 65Meteor 67 59 39Ghost 21 24 28Habitica 11 8 5PDF.js 14 12 14React 18 12 5Serverless 26 12 8Webpack 22 24 8Storybook 2 1 4Electron 7 3 6
Sum 280 232 182
6. Evaluation
In this section, we first introduce our experiment setting (Section 6.1) and themetrics used to evaluate CoRec’s effectiveness (Section 6.2). Then we explainour investigation with different ML algorithms and present CoRec’s sensitivityto the adopted ML algorithms (Section 6.3), through which we finalize thedefault ML algorithm applied in CoRec. Next we expound on the effectivenesscomparison between CoRec and two existing tools: ROSE [5] and TransitiveAssociate Rules (TAR) [9] (Section 6.4). Finally, we present the comparisonbetween CoRec and a variant approach that trains one unified classifier insteadof three distinct classifiers to recommend changes (Section 6.5).
We mined repositories of the 10 open-source projects introduced in Section 4,and found three distinct sets of commits in each project that are potentially us-able for model training and testing. As shown in Table 6, in total, we found 280commits matching P1, 232 commits matching P2, and 182 commits matchingP3. Each of these pattern matches has at least two co-changed functions (*CF)depending on E m . In our evaluation, for each data set of each project, we coulduse part of the data to train a classifier and use the remaining data to testthe trained classifier. Because Storybook and Electron have few commits, weexcluded them from our evaluation and simply used the identified commits ofthe other eight projects to train and test classifiers.We decided to use k -fold cross validation to evaluate CoRec’s effectiveness.Namely, for every data set of each project, we split the mined commits into k portions roughly evenly; each fold uses ( k −
1) data portions for training and theremaining portion for testing. Among the eight projects, because each projecthas at least five commits matching each pattern, we set k = 5 to diversify ourevaluation scenarios as much as possible. For instance, Habitica has five commitsmatching P3. When evaluating CoRec’s capability of predicting co-changes forHibitica based on P3, in each of the five folds, we used four commits for trainingand one commit for testing. Figure 12 illustrates our five-fold cross validationprocedure. In the procedure, we ensured that each of the five folds adopted a21 ortion 1Portion 2Portion 3Portion 4Portion 5TrainingTesting Commit c1: E m , cf , cf , …cf n … Task t11: E m , cf , {cf , …, cf n }Task t12: E m , cf , {cf , cf , …, cf n } … Task t1n: E m , cf n , {cf , …, cf n-1 }Commit c2: … Figure 12: Typical data processing for each fold of the five-fold cross validationTable 7: Total numbers of prediction tasks involved in the five-fold cross validation
Project
Node.js 398 309 223Meteor 401 229 107Ghost 76 77 99Habitica 30 23 18PDF.js 41 31 35React 72 37 17Serverless 81 38 23Webpack 138 90 22
Sum 1,237 834 544 distinct data portion to construct prediction tasks for testing purposes. For anytesting commit that has n co-changed functions (*CF) depending on E m , i.e., CF Set = { cf , cf , . . . , cf n } , we created n prediction tasks in the following way.In each prediction task, we included one known changed function cf i ( i ∈ [1 , n ])together with E m and kept all the other ( n −
1) functions unchanged. Weregarded the ( n −
1) functions as ground truth ( GT ) to assess how accuratelyCoRec can recommend co-changes given E m and cf i .For instance, one prediction task we created in React includes the followings: E m = src/isomorphic/classic/types.ReactPropTypes.createChainableTypeChecker(...) , cf = src/isomorphic/classic/types.ReactPropTypes.createObjectOfTypeChecker(...) ,and GT = { src/isomorphic/ classic/types.ReactPropTypes.createShapeTypeChecker(...) } .When CoRec blindly pairing cf with any unchanged function, it may extractfeature values as below: feature1 = 1, feature2 = true , feature3 = 0, feature4 =2, feature5 = 0, feature6 = true , feature7 = true , feature8 = 76%, feature9 =45%, feature10 = 1 } . Table 7 shows the total numbers of prediction tasks wecreated for all projects and all patterns among the five-fold cross validation. We defined and used four metrics to measure a tool’s capability of recom-mending co-changed functions: coverage, precision, recall, and F-score. We also22efined the weighted average to measure a tool’s overall effectiveness among allsubject projects for each of the metrics mentioned above.
Coverage (Cov) is the percentage of tasks for which a tool can providesuggestion. Given a task, a tool may or may not recommend any change tocomplement the already-applied edit, so this metric assesses the tool applica-bility. Intuitively, the more tasks for which a tool can recommend one or morechanges, the more applicable this tool is.
Cov = × Coverage varies within [0%, 100%]. If a tool always recommends some change(s)given a task, its coverage is 100%. All our later evaluations for precision, recall,and F-score are limited to the tasks covered by a tool. For instance, supposethat given 100 tasks, a tool can recommend changes for 10 tasks. Then the tool’scoverage is 10 /
100 = 10%, and the evaluations for other metrics are based onthe 10 instead of 100 tasks.
Precision (Pre) measures among all recommendations by a tool, how manyof them are correct:
P re = × This metric evaluates how precisely a tool recommends changes. If all sugges-tions by a tool are contained by the ground truth, the precision is 100%.
Recall (Rec) measures among all the expected recommendations, howmany of them are actually reported by a tool:
Rec = × This metric assesses how effectively a tool retrieves the expected co-changedfunctions. Intuitively, if all expected recommendations are reported by a tool,the recall is 100%.
F-score (F1) measures the accuracy of a tool’s recommendation: F × P re × RecP re + Rec × F-score is the harmonic mean of precision and recall. Its value varies within [0%,100%]. The higher F-scores are desirable, as they demonstrate better trade-offsbetween the precision and recall rates.
Weighted Average (WA) measures a tool’s overall effectiveness amongall experimented data in terms of coverage, precision, recall, and F-score: Γ overall = (cid:80) i =1 Γ i ∗ n i (cid:80) i =1 n i . (6) In the formula, i varies from 1 to 8, representing the 8 projects used in ourevaluation (Storybook and Electron were excluded). Here, i = 1 corresponds toNode.js and i = 8 corresponds to Webpack; n i represents the number of tasks23uilt from the i th project. Γ i represents any measurement value of the i th projectfor coverage, precision, recall, or F-score. By combining such measurementvalues of eight projects in a weighted way, we were able to assess a tool’s overalleffectiveness Γ overall . We designed CoRec to use Adaboost, with Random Forests as the weaklearners to train classifiers. To make this design decision, we tentatively inte-grated CoRec with five alternative algorithms: J48 [36], Random Forest [37],Na¨ıve Bayes [38], Adaboost (default), and Adaboost (Random Forest). • J48 builds a decision tree as a predictive model to go from observationsabout an item (represented in the branches) to conclusions about theitem’s target value (represented in the leaves). • Na¨ıve Bayes calculates the probabilities of hypotheses by applying Bayes’theorem with strong (na¨ıve) independence assumptions between features. • Random Forest is an ensemble learning method that trains a model tomake predictions based on a number of different models. Random Foresttrains a set of individual models in a parallel way. Each model is trainedwith a random subset of the data. Given a candidate in the testing set,individual models make their separate predictions and Random Forest usesthe one with the majority vote as its final prediction. • Adaboost is also an ensemble learning method. However, different fromRandom Forest, Adaboost trains a bunch of individual models (i.e., weaklearners) in a sequential way. Each individual model learns from mis-takes made by the previous model. We tried two variants of Adaboost:(1) Adaboost (default) with decision trees as the weak learners, and (2)Adaboost (Random Forest) with Random Forests as the weak learners.Figure 13 illustrates the effectiveness comparison when CoRec adopts differ-ent ML algorithms. The three subfigures (Figure 13 (a)–(c)) separately presentthe comparison results on the data sets of *CF f −→ CF, *CF f −→ AF, and *CF v −→ AV.We observed similar phenomena in all subfigures. By comparing the first fourbasic ML algorithms (J48, Na¨ıve Bayes, Random Forest, and Adaboost (de-fault)), we noticed that Random Forest achieved the best results in all metrics.Among all datasets, Na¨ıve Bayes obtained the lowest recall and accuracy rates.Although Adaboost obtained the second highest F-score, its coverage is the low-est probably because it uses decision trees as the default weak learners. Basedon our evaluation with the first four basic algorithms, we were curious how wellAdaboost performs if it integrates Random Forests as weak learners. Thus, wealso experimented with a fifth algorithm: Adaboost (Random Forest).As shown in Figure 13, Adaboost (Random Forest) and Random Forestachieved very similar effectiveness, and both of them considerably outperformedthe other algorithms. But compared with Random Forest, Adaboost (Random24 a) The *CF f −→ CF data(b) The *CF f −→ AF data(c) The *CF v −→ AV data
Figure 13: Comparison between different ML algorithms on different data sets
Finding 5:
CoRec is sensitive to the adopted ML algorithm. CoRecobtained the lowest prediction accuracy when Na¨ıve Bayes was used, butacquired the highest accuracy when Adaboost (Random Forest) was used.6.4. Effectiveness Comparison with ROSE and TAR
In our evaluation, we compared CoRec with a popularly used tool ROSE [5]and a more recent tool Transitive Associate Rules (TAR) [9]. Both of thesetools recommend changes by mining co-change patterns from version history.Specifically, ROSE mines the association rules between co-changed entitiesfrom software version history. An exemplar mined rule is shown below: { ( Qdmodule.c, func, GrafObj getattr ()) } ⇒ (cid:8) ( qdsupport.py, func, outputGetattrHook ()) . (cid:9) (7) This rule means that whenever the function
GrafObj getattr() in a file
Qdmodule.c is changed, the function outputGetattrHook() in another file qdsupport.py shouldalso be changed. Based on such rules, given a program commit, ROSE ten-tatively matches all edited entities with the antecedents of all mined rules andrecommends co-changes if any tentative match succeeds. Similar to ROSE, TARalso mines association rules from version history. However, in addition to themined rules (e.g., E ⇒ E E ⇒ E transitiveinference to derive more rules (e.g., E ⇒ E conf ( E ⇒ E
3) = conf ( E ⇒ E × conf ( E ⇒ E support = 1 and conf idence = 0 .
1, because the ROSEpaper [5] mentioned this setting multiple times and it achieved the best resultsby balancing recall and precision. For consistency, we also configured TAR with support = 1 and conf idence = 0 . f −→ CF pre-diction tasks in this project, CoRec provided change recommendations for 89%26 able 8: Evaluation results of CoRec, ROSE, and TAR for *CF f −→ CF tasks (%)
Project CoRec ROSE TARCov Pre Rec F1 Cov Pre Rec F1 Cov Pre Rec F1
Node.js 77 68 69 69 61 24 56 34 65 15 62 24Meteor 88 72 70 71 46 16 43 24 52 15 47 23Ghost 73 67 74 71 50 20 53 29 50 14 57 22Habitica 80 80 78 79 40 7 37 12 35 5 42 9PDF.js 71 77 81 79 29 27 41 33 33 8 45 14React 91 86 76 81 32 59 70 64 32 57 74 64Serverless 84 77 79 78 64 20 75 32 68 16 80 27Webpack 89 71 81 75 50 7 29 12 50 5 34 9
WA 83 72 73 73 53 21 52 29 57 15 59 24
Table 9: Result comparison among CoRec, ROSE, and TAR for *CF f −→ AF tasks (%)
Project CoRec ROSE TARCov Pre Rec F1 Cov Pre Rec F1 Cov Pre Rec F1
Node.js 79 69 74 72 59 20 52 29 61 14 61 23Meteor 86 77 82 80 40 22 44 29 46 21 50 29Ghost 85 86 85 85 46 18 46 26 50 14 49 22Habitica 87 77 85 81 56 4 23 7 58 2 39 4PDF.js 65 87 88 87 22 9 28 14 23 11 58 19React 71 84 82 83 16 66 7 13 17 67 8 14Serverless 84 71 85 77 73 19 59 29 74 15 60 24Webpack 75 79 85 82 53 16 46 24 56 13 49 21
WA 81 76 80 78 54 21 48 28 56 16 55 24 of tasks; with these recommendations, CoRec achieved 71% precision, 81% re-call, and 75% accuracy. On the other hand, ROSE and TAR recommendedchanges for only 50% of tasks; based on its recommendations, ROSE acquiredonly 7% precision, 29% recall, and 12% accuracy, while TAR obtained 5% pre-cision, 34% recall, and 9% accuracy. Among the eight subject projects, theweighted average measurements of CoRec include 83% coverage, 72% precision,73% recall, and 73% accuracy. Meanwhile, the weighted average measurementsof ROSE include 53% coverage, 21% precision, 52% recall, and 29% accuracy.TAR achieved 59% average recall, but its average precision and accuracy arethe lowest among the three tools. Such measurement contrasts indicate thatCoRec usually recommended more changes than ROSE or TAR, and CoRec’srecommendations were more accurate.In addition to *CF f −→ CF tasks, we also compared CoRec with ROSE andTAR for *CF f −→ AF and *CF v −→ AV tasks, as shown in Tables 9 and 10. Similarto what we observed in Table 8, CoRec outperformed ROSE and TAR in termsof all metrics for both types of tasks. As shown in Table 9, given *CF f −→ AFtasks, on average, CoRec achieved 81% coverage, 76% precision, 80% recall,and 78% accuracy. ROSE acquired 54% coverage, 21% precision, 48% recall,and 28% accuracy. TAR obtained 56% coverage, 16% precision, 55% recall,and 24% accuracy. In Table 10, for Serverless, CoRec achieved 70% coverage,27 able 10: Result comparison among CoRec, ROSE, and TAR for *CF v −→ AV tasks (%)
Project CoRec ROSE TARCov Pre Rec F1 Cov Pre Rec F1 Cov Pre Rec F1
Node.js 79 72 77 74 55 20 65 31 56 16 74 26Meteor 72 77 84 81 26 2 14 4 27 2 31 3Ghost 84 75 81 78 46 18 46 26 38 8 70 14Habitica 89 82 85 83 27 20 45 28 28 17 54 26PDF.js 78 87 84 85 20 4 28 8 20 5 29 8React 89 73 78 76 36 8 33 13 12 98 34 50Serverless 70 80 85 82 34 0 0 - 38 1 13 2Webpack 87 86 83 85 36 8 33 13 40 3 34 5
WA 79 76 81 78 45 17 54 25 47 12 62 19
80% precision, 85% recall, and 82% accuracy. Meanwhile, ROSE only providedrecommendations for 34% of the tasks, and none of these recommendationsis correct. TAR only provided recommendations for 38% of tasks; with therecommendations, TAR achieved 1% precision, 13% recall, and 2% accuracy.Comparing the results shown in Tables 8–10, we found the effectiveness ofCoRec, ROSE, and TAR to be stable across different types of prediction tasks.Specifically among the three kinds of tasks, on average, CoRec achieved 79%–83% coverage, 72%–76% precision, 73%–81% recall, and 73%–78% accuracy.On the other hand, ROSE achieved 45%–54% coverage, 17%–21% precision,48%–54% recall, and 25%–29% accuracy; TAR achieved 47%–56% coverage,12%–16% precision, 55%–62% recall, and 19%–24% accuracy. The consistentcomparison results imply that CoRec usually recommended co-changed func-tions for more tasks, and CoRec’s recommendations usually had higher quality.Two major reasons can explain why CoRec worked best. First, ROSEand TAR purely use the co-changed entities in version history to recommendchanges. When the history data is incomplete or some entities were never co-changed before, both tools may lack evidence to predict co-changes and thusobtain lower coverage and recall rates. Additionally, TAR derives more rulesthan ROSE via transitive inference. Namely, if E ⇒ E E ⇒ E
3, then E ⇒ E
3. However, it is possible that E E cf and the changed entity E on28 able 11: The effectiveness of CoRec u when it trains and tests a unified classifier (%) Project Cov Pre Rec F1
Node.js 72 50 57 53Meteor 77 59 58 59Ghost 53 61 70 65Habitica 55 53 68 60PDF.js 29 60 73 66React 76 75 73 74Serverless 54 47 61 53Webpack 66 54 63 58
WA 70 56 61 59 which cf depends ( E is CF in P1, AF in P2, and AV in P3).Although CoRec outperformed ROSE and TAR in our experiments, we con-sider CoRec as a complementary tool to existing tools. This is because CoRecbases its change recommendations on the three most popular RCPs we found.If some changes do not match any of the RCPs, CoRec does not recommendany change but ROSE may suggest some edits. Finding 6:
CoRec outperformed ROSE and TAR when predicting co-changed functions based on the three recurring change patterns (P1–P3).CoRec serves as a good complementary tool to both tools.6.5. Comparison with A Variant Approach
Readers may be tempted to train a unified classifier instead of three separateclassifiers, because the three classifiers all take in the same format of inputsand output the same types of predictions (i.e., whether to co-change or not).However, as shown in Table 4, the commonality characteristics between co-changed functions vary with RCPs. For instance, the co-changed functions inP2 usually commonly invoke peer functions (i.e., FI), the co-changed functionsin P3 often commonly read/write peer variables (i.e., VA), and the co-changedfunctions in P1 have weaker commonality signals for both FI and ST (i.e.,common token subsequences). If we mix the co-changed functions matchingdifferent patterns to train a single classifier, it is quite likely that the extractedfeatures between co-changed functions become less informative, and the trainedclassifier has poorer prediction power.To validate our approach design, we also built a variant approach of CoRec—CoRec u —that trains a unified classifier with the program commits matching ei-ther of the three RCPs (P1–P3) and predicts co-change functions with the singleclassifier. To evaluate CoRec u , we clustered the data portions matching distinctRCPs for each project, and conducted five-fold cross validation. As shown inTable 11, on average, CoRec u recommended changes with 70% coverage, 56%precision, 61% recall, and 59% accuracy. These measured values are much lowerthan the weighted averages of CoRec reported in Tables 8–10. The empirical29omparison corroborates our hypothesis that when data matching distinct RCPsare mixed to train a unified classifier, the classifier works less effectively. Finding 7:
CoRec u worked less effectively than CoRec by training aunified classifier with data matching distinct RCPs. This experimentvalidates our approach design of training three separate classifiers.
7. Threats to Validity
Threats to External Validity:
All our observations and experiment resultsare limited to the software repositories we used. These observations and resultsmay not generalize well to other JS projects, especially to the projects that usethe Asynchronous Module Definition (AMD) APIs to define code modules andtheir dependencies. In the future, we would like to include more diverse projectsinto our data sets so that our findings are more representative.Given a project P , CoRec adopts commits in P ’s software version historyto train classifiers that can recommend co-changes for new program commits.When the version history has few commits to train classifiers, the applicability ofCoRec is limited. CoRec shares such limitation with existing tools that provideproject-specific change suggestions based on software version history [5, 39, 9].To further lower CoRec’s requirement to available commits in software versionhistory, we plan to investigate more ways to extract features from commits andbetter capture the characteristics of co-changed functions. Threats to Internal Validity:
In our experiments, we sampled a subset ofcommits in each project based on the keywords “bug”, “fix”, “error”, “adjust”,and “failure” in commit messages. Our insight is that developers may applytangled changes (i.e., unrelated or loosely related code changes) in a singlecommit [40]; such commits can introduce data noise and make our research in-vestigation biased. Based on our experience, the commits with above-mentionedkeywords are likely to fix bugs, and thus each of such commits may be appliedto achieve one maintenance goal and contain no tangled changes. However, thekeywords we used may not always accurately locate bug fixes, neither do theyguarantee that developers apply no tangled changes in individual sampled com-mits. In the future, we plan to sample commits in other ways and analyze howour observations vary with the sampling techniques.
Threats to Construct Validity:
When creating recommendation tasks forclassifier evaluation, we always assumed that the experimented commits con-tain accurate information of all co-changed functions. It is possible that devel-opers made mistakes when applying multi-entity edits. Therefore, the imper-fect evaluation data set based on developers’ edits may influence our empiricalcomparison between CoRec and ROSE. We share this limitation with priorwork [5, 39, 9, 41, 42, 8, 7]. In the future, we plan to mitigate the problemby conducting user studies with developers. By carefully examining the editsmade by developers and the co-changed functions recommended by tools, wecan better assess the effectiveness of different tools.30 . Related Work
The related work includes empirical studies on JS code and related programchanges, JS bug detectors, and co-change recommendation systems.
Various studies were conducted to investigate JS code and related changes [43,44, 45, 46, 47]. For instance, Ocariza et al. conducted an empirical study of 317bug reports from 12 bug repositories, to understand the root cause and con-sequence of each reported bug [43]. They observed that 65% of JS bugs werecaused by the faulty interactions between JS code and Document Object Mod-els (DOMs). Gao et al. empirically investigated the benefits of leveraging statictype systems (e.g., Facebook’s Flow [48] and Microsoft’s TypeScript [49]) tocheck JS programs [45]. To do that, they manually added type annotations tobuggy code and tested whether Flow and TypeScript reported an error on thebuggy code. They observed that both Flow 0.30 and TypeScript 2.0 detected15% of errors, showing great potential of finding bugs.Silva et a. [47] extracted changed source files from software version history,and revealed six co-change patterns by mapping frequently co-changed files totheir file directories. Our research is different in three ways. First, we focused onsoftware entities with finer granularities than files; we extracted the co-changepatterns among classes, functions, variables, and statement blocks. Second,since unrelated entities are sometimes accidentally co-changed in program com-mits, we exploited the syntactic dependencies between entities to remove suchdata noise and to improve the quality of identified patterns. Third, CoRecuses the identified patterns to further recommend co-changes with high quality.Wang et al. [18] recently conducted a study on multi-entity edits applied toJava programs, which study is closely relevant to our work. Wang et al. focusedon three kinds of software entities: classes, methods, and fields. They createdCDGs for individual multi-entity edits, and revealed RCPs by comparing CDGs.The three most popular RCPs they found are: *CM m −→ CM (a callee method isco-changed with its caller(s)), *CM m −→ AM (a method is added, and one or moreexisting methods are changed to invoke the added method), and *CM f −→ AF (afield is added, and at least one existing method is changed to access the field).Our research is inspired by Wang et al.’s work. We decided to conduct asimilar study on JS programs mainly because JS is very different from Java. Forinstance, JS is weakly typed and has more flexible syntax rules; Java is stronglytyped and variables must be declared before being used. JS is a script languageand mainly used to make web pages more interactive; Java is used in more do-mains. We were curious whether developers’ maintenance activities vary withthe programming languages they use, and whether there are unique co-changepatterns in JS programs. In our study, we adopted JS parsing tools, identifiedfour kinds of entities in various ways, and did reveal some co-change patternsunique to JS programs because of the language’s unique features. Surprisingly,the three most popular JS co-change patterns we observed match exactly with31he Java co-change patterns mentioned above. Our study corroborates obser-vations made by prior work. More importantly, it indicates that even thoughdifferent programming languages provide distinct features, developers are likelyto apply multi-entity edits in similar ways. This phenomenon sheds lights onfuture research directions of cross-language co-change recommendations.
Researchers built tools to automatically detect bugs or malicious JS code [50,51, 52, 53, 2, 54, 55, 56]. For example, EventRacer detects harmful data racesin even-driven programs [53]. JSNose combines static and dynamic analysisto detect 13 JS smells in client-side code, where smells are code patterns thatcan adversely influence program comprehension and software maintenance [2].TypeDevil adopts dynamic analysis to warn developers about variables, prop-erties, and functions that have inconsistent types [55]. DeepBugs is a learning-based approach that formulates bug detection as a binary classification problem;it is able to detect accidentally swapped function arguments, incorrect binaryoperators, and incorrect operands in binary operations [56]. EarlyBird conductsdynamic analysis and adopts machine learning techniques for early identificationof malicious behaviors of JavaScript code [52].Some other researchers developed tools to suggest bug fixes or code refac-torings [57, 58, 59, 60, 61, 62, 63]. With more details, Vejovis suggests programrepairs for DOM-related JS bugs based on two common fix patterns: parameterreplacements and DOM element validations [60]. Monperrus and Maia built aJS debugger to help resolve “crowd bugs” (i.e., unexpected and incorrect out-puts or behaviors resulting from the common and intuitive usage of APIs) [61].Given a crowd bug, the debugger sends a code query to a server and retrieves allStackOverflow answers potentially related to the bug fix. An and Tilevich builta JS refactoring tool to facilitate JS debugging and program repair [63]. Given adistributed JS application, the tool first converts the program to a semanticallyequivalent centralized version by gluing together the client and server parts.After developers fixed bugs in the centralized version, the tool generates fixesfor the original distributed version accordingly. In Model-Driven Engineering,ReVision repairs incorrectly updated models by (1) extracting change patternsfrom version history, and (2) matching incorrect updates against those patternsto suggest repair operations [64].Our methodology is most relevant to the approach design of ReVision. How-ever, our research is different in three aspects. First, our research focuses onentity-level co-change patterns in JS programs, while ReVision checks for con-sistencies different UML artifacts (e.g., the signature of a message in a sequencediagram must correspond to a method signature in the related class diagram).Second, the co-changed recommendation by CoRec intends to complete an ap-plied multi-entity edit, while the repair operations proposed by ReVision tries tocomplete consistency-preserving edit operations. Third, we conducted a large-scale empirical study to characterize multi-entity edits and experimented CoRecwith eight open-source projects, while ReVision was not empirically evaluated.32 .3. Co-Change Recommendation Systems
Approaches were introduced to mine software version history and to extractco-change patterns [47, 65, 66, 67, 5, 68, 39, 9, 69, 70, 71, 72, 73, 6]. Specifically,Some researchers developed tools (e.g., ROSE) to mine the association rulesbetween co-changed entities and to suggest possible changes accordingly [5, 68,39, 9, 69, 73, 6]. Some other researchers built hybrid approaches by combininginformation retrieval (IR) with association rule mining [70, 71, 72]. Given asoftware entity E , these approaches use IR techniques to (1) extract terms from E and any other entity and (2) rank those entities based on their term-usageoverlap with E . Meanwhile, these tools also apply association rule mining tocommit history in order to rank entities based on the co-change frequency. Inthis way, if an entity G has significant term-usage overlap with E and has beenco-changed a lot with E , then G is recommended to be co-changed with E .Shirabad et al. developed a learning-based approach that predicts whethertwo given files should be changed together or not [67]. In particular, the re-searchers extracted features from software repository to represent the relation-ship between each pair of files, adopted those features of file pairs to train anML model, and leveraged the model to predict whether any two files are rele-vant (i.e., should be co-changed) or not. CoRec is closely related to Shirabadet al.’s work. However, it is different in two aspects. First, CoRec predictsco-changed functions instead of co-changed files. With finer-granularity rec-ommendations, CoRec can help developers to better validate suggested changesand to edit code more easily. Second, our feature engineering for CoRec is basedon the quantitative analysis of frequent change patterns and qualitative analysisof the commonality between co-changed functions, while the feature engineer-ing by Shirabad is mainly based on their intuitions. Consequently, most of ourfeatures are about the code commonality or co-evolution relationship betweenfunctions; while the features defined by Shirabad et al. mainly focus on filenames/paths, routines referenced by each file, and the code comments togetherwith problem reports related to each file.Wang et al. built CMSuggester—an automatic approach to suggest method-level co-changes [7, 8]. Different from CMSuggester, CoRec is an ML-basedinstead of rule-based approach; it requires for data to train an ML model be-fore suggesting changes while CMSuggester requires tool builders to hardcodethe suggestion strategies. CoRec recommends changes based on three RCPs:*CF f −→ CF, *CF f −→ AF, and *CF v −→ AV; CMSuggester recommends changes basedon the last two patterns. Our approach is more applicable. CMSuggester appliespartial program analysis (PPA) to identify the referencer-referencee relationshipbetween entities; it does not work when PPA is inapplicable. Meanwhile, CoRecuses two alternative ways to infer the referencer-referencee relationship: typed-ast-util can accurately resolve bindings and link entities, while our heuristicapproach links entities less accurately but is always applicable even if typed-ast-util does not work. Additionally, our evaluation is more comprehensive. Weevaluated CoRec by integrating it with different ML algorithms and using theprogram data of eight open-source projects; CMSuggester was evaluated with33he data of only four projects. Overall, CoRec is more flexible due to its usageof ML and is applicable to more types of co-change scenarios.
9. Conclusion
It is usually tedious and error-prone to develop and maintain JS code. Tofacilitate program comprehension and software debugging, we conducted anempirical study on multi-entity edits in JS projects and built an ML-basedco-change recommendation tool CoRec. Our empirical study explored the fre-quency and composition of multi-entity edits in JS programs, and investigatedthe syntactic and semantic relevance between frequently co-changed entities. Inparticular, we observed that (i) JS software developers frequently apply multi-entity edits while the co-changed entities are usually syntactically related; (ii)there are three most popular RCPs that commonly exist in all studied JS coderepositories: *CF f −→ CF, *CF f −→ AF, and *CF v −→ AV; and (iii) among the entitiesmatching these three RCPs, co-changed functions usually share certain com-monality (e.g., common function invocations and common token subsequences).Based on our study, we developed CoRec, which tool extracts code featuresfrom the multi-entity edits that match any of the three RCPs, and trains an MLmodel with the extracted features to specially characterize relationship betweenco-changed functions. Given a new program commit or a set of entity changesthat developers apply, the trained model extracts features from the programrevision and recommends changes to complement applied edits as necessary.Our evaluation shows that CoRec recommended changes with high accuracy andoutperformed two existing techniques. In the future, we will investigate novelapproaches to provide finer-grained code change suggestions and automate testcase generation for suggested changes.
References [1] The 10 most popular programming languages, according to theMicrosoft-owned GitHub, (2019).[2] A. M. Fard, A. Mesbah, Jsnose: Detecting javascript code smells, in: 2013IEEE 13th International Working Conference on Source Code Analysis andManipulation (SCAM), 2013, pp. 116–125.[3] A. Saboury, P. Musavi, F. Khomh, G. Antoniol, An empirical study ofcode smells in javascript projects, 2017, pp. 294–305. doi:10.1109/SANER.2017.7884630 .[4] R. Ferguson, Introduction to JavaScript, Apress, Berkeley, CA, 2019, pp.1–10. doi:10.1007/978-1-4842-4395-4_1 .URL https://doi.org/10.1007/978-1-4842-4395-4_1 http://login.ezproxy.lib.vt.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=edselc&AN=edselc.2-52.0-85073505497&site=eds-live&scope=site [9] M. A. Islam, M. M. Islam, M. Mondal, B. Roy, C. K. Roy, K. A. Schneider,[research paper] detecting evolutionary coupling using transitive associationrules, in: Proc. SCAM, 2018, pp. 113–122.[10] Z. P. Fry, W. Weimer, A human study of fault localization accuracy, in:2010 IEEE International Conference on Software Maintenance, 2010, pp.1–10. doi:10.1109/ICSM.2010.5609691 .[11] T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. Al-Kofahi, T. N. Nguyen,Recurring bug fixes in object-oriented programs, in: ACM/IEEE Interna-tional Conference on Software Engineering, 2010, pp. 315–324. doi:http://doi.acm.org/10.1145/1806799.1806847 .[12] Z. Yin, D. Yuan, Y. Zhou, S. Pasupathy, L. Bairavasundaram, How do fixesbecome bugs?, in: Proc. ESEC/FSE, 2011, pp. 26–36.[13] J. Park, M. Kim, B. Ray, D.-H. Bae, An empirical study of supplementarybug fixes, in: IEEE Working Conference on Mining Software Repositories,2012, pp. 40–49.[14] Nodejs node, https://github.com/nodejs/node (2020).[15] Fs: make callback mandatory to all async functions, https://github.com/nodejs/node/commit/21b0a27 (2016).[16] What Is ES6 and What Javascript Programmers Need to Know, (2017).[17] X. Ren, F. Shah, F. Tip, B. G. Ryder, O. Chesley, Chianti: A tool forchange impact analysis of java programs, in: Proceedings of the 19th An-nual ACM SIGPLAN Conference on Object-oriented Programming, Sys-tems, Languages, and Applications, OOPSLA ’04, ACM, New York, NY,35SA, 2004, pp. 432–448. doi:10.1145/1028976.1029012 .URL http://doi.acm.org/10.1145/1028976.1029012 [18] Y. Wang, N. Meng, H. Zhong, An empirical study of multi-entity changesin real bug fixes, in: Proc. ICSME, 2018, pp. 287–298.[19] Esprima (2020).URL https://esprima.org/ [20] typed-ast-util, https://github.com/returntocorp/typed-ast-util (2020).[21] Fix some fibers vs SQLite issues, https://github.com/meteor/meteor/commit/e9a88b00b9cdd35eb281c7113fcaa5155f006ea3 (2020).[22] Meteor, https://github.com/meteor/meteor (2020).[23] J. Falleri, F. Morandat, X. Blanc, M. Martinez, M. Monperrus, Fine-grained and accurate source code differencing, in: ACM/IEEE Interna-tional Conference on Automated Software Engineering, ASE ’14, Vasteras,Sweden - September 15 - 19, 2014, 2014, pp. 313–324. doi:10.1145/2642937.2642982 .URL http://doi.acm.org/10.1145/2642937.2642982 [24] L. P. Cordella, P. Foggia, C. Sansone, M. Vento, A (sub)graph isomor-phism algorithm for matching large graphs, IEEE Transactions on Pat-tern Analysis and Machine Intelligence 26 (10) (2004) 1367–1372. doi:10.1109/TPAMI.2004.75 .[25] Ghost, https://github.com/TryGhost/Ghost (2020).[26] Habitrpg habitica, https://github.com/HabitRPG/habitica (2020).[27] Mozilla pdf, https://github.com/mozilla/pdf.js/ (2020).[28] Facebook react, https://github.com/facebook/react (2020).[29] Serverless, https://github.com/serverless/serverless (2020).[30] Webpack, https://github.com/webpack/webpack (2020).[31] Storybook, https://github.com/storybookjs/storybook (2020).[32] Electron, https://github.com/electron/electron (2020).[33] Http2: introducing HTTP/2, https://github.com/nodejs/node/commit/e71e71b5138c3dfee080f4215dd957dc7a6cbdaf (2017).[34] J. Ingeno, Software Architect’s Handbook: Become a Successful SoftwareArchitect by Implementing Effective Architecture Concepts, Packt Pub-lishing, 2018. 3635] Y. Freund, R. E. Schapire, Experiments with a new boosting algorithm, in:Proceedings of the Thirteenth International Conference on InternationalConference on Machine Learning, ICML’96, Morgan Kaufmann PublishersInc., San Francisco, CA, USA, 1996, pp. 148–156.[36] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan KaufmannPublishers Inc., San Francisco, CA, USA, 1993.[37] A. Liaw, M. Wiener, Classification and regression by randomforest, R News2 (3) (2002) 18–22.URL https://CRAN.R-project.org/doc/Rnews/ [38] D. D. Lewis, Naive (bayes) at forty: The independence assumption in infor-mation retrieval, in: C. N´edellec, C. Rouveirol (Eds.), Machine Learning:ECML-98, Springer Berlin Heidelberg, Berlin, Heidelberg, 1998, pp. 4–15.[39] T. Rolfsnes, S. D. Alesio, R. Behjati, L. Moonen, D. W. Binkley, Gen-eralizing the analysis of evolutionary coupling for software change impactanalysis, in: Proc. SANER, 2016, pp. 201–212.[40] K. Herzig, A. Zeller, The impact of tangled code changes, in: 2013 10thWorking Conference on Mining Software Repositories (MSR), 2013, pp.121–130. doi:10.1109/MSR.2013.6624018 .[41] N. Meng, M. Kim, K. McKinley, Lase: Locating and applying systematicedits, in: Proc. ICSE, 2013, pp. 502–511.[42] Tan, Ming, Online defect prediction for imbalanced data, Master’s thesis,University of Waterloo (2015).[43] F. Ocariza, K. Bajaj, K. Pattabiraman, A. Mesbah, An empirical study ofclient-side javascript bugs., 2013 ACM / IEEE International Symposiumon Empirical Software Engineering and Measurement, Empirical SoftwareEngineering and Measurement, 2013 ACM / IEEE International Sympo-sium on, Empirical Software Engineering and Measurement (ESEM), 2012ACM-IEEE International Symposium on (2013) 55 – 64.URL http://login.ezproxy.lib.vt.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=edseee&AN=edseee.6681338&site=eds-live&scope=site [44] M. Selakovic, M. Pradel, Performance issues and optimizations injavascript: An empirical study, in: 2016 IEEE/ACM 38th InternationalConference on Software Engineering (ICSE), 2016, pp. 61–72. doi:10.1145/2884781.2884829 .[45] Z. Gao, C. Bird, E. T. Barr, To type or not to type: Quantifying de-tectable bugs in javascript, in: 2017 IEEE/ACM 39th International Con-ference on Software Engineering (ICSE), 2017, pp. 758–769. doi:10.1109/ICSE.2017.75 . 3746] P. Gyimesi, B. Vancsics, A. Stocco, D. Mazinanian, ´A. Besz´edes, R. Ferenc,A. Mesbah, Bugsjs: a benchmark of javascript bugs, in: 2019 12th IEEEConference on Software Testing, Validation and Verification (ICST), 2019,pp. 90–101. doi:10.1109/ICST.2019.00019 .[47] L. L. Silva, M. T. Valente, M. A. Maia, Co-change patterns: A large scaleempirical study, Journal of Systems and Software 152 (2019) 196 – 214. doi:https://doi.org/10.1016/j.jss.2019.03.014 .URL [48] Flow: A Static Type Checker for JavaScript, https://flow.org (2020).[49] TypeScript - JavaScript that scales., (2020).[50] M. Cova, C. Kruegel, G. Vigna, Detection and analysis of drive-by-download attacks and malicious javascript code, in: Proceedings of the19th International Conference on World Wide Web, WWW ’10, Associa-tion for Computing Machinery, New York, NY, USA, 2010, pp. 281–290. doi:10.1145/1772690.1772720 .URL https://doi.org/10.1145/1772690.1772720 [51] F. S. Ocariza Jr., K. Pattabiraman, A. Mesbah, Autoflox: An automaticfault localizer for client-side javascript, in: 2012 IEEE Fifth InternationalConference on Software Testing, Verification and Validation, 2012, pp. 31–40.[52] K. Sch¨utt, M. Kloft, A. Bikadorov, K. Rieck, Early detection of maliciousbehavior in javascript code, in: Proceedings of the 5th ACM Workshop onSecurity and Artificial Intelligence, AISec ’12, Association for ComputingMachinery, New York, NY, USA, 2012, pp. 15–24. doi:10.1145/2381896.2381901 .URL https://doi.org/10.1145/2381896.2381901 [53] V. Raychev, M. Vechev, M. Sridharan, Effective race detection for event-driven programs, ACM SIGPLAN NOTICES 48 (10) (2013) 151 – 166.URL http://login.ezproxy.lib.vt.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=edswsc&AN=000327697300008&site=eds-live&scope=site [54] J. Park, Javascript api misuse detection by using typescript, in: Proceed-ings of the Companion Publication of the 13th International Conference onModularity, MODULARITY ’14, Association for Computing Machinery,New York, NY, USA, 2014, pp. 11–12. doi:10.1145/2584469.2584472 .URL https://doi.org/10.1145/2584469.2584472 [55] M. Pradel, P. Schuh, K. Sen, Typedevil: Dynamic type inconsistency anal-ysis for javascript, 2015 IEEE/ACM 37th IEEE International Conference38n Software Engineering (2015) 314.URL http://login.ezproxy.lib.vt.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=edb&AN=110064267&site=eds-live&scope=site [56] M. Pradel, K. Sen, Deepbugs: A learning approach to name-based bug de-tection, Proc. ACM Program. Lang. 2 (OOPSLA). doi:10.1145/3276517 .URL https://doi.org/10.1145/3276517 [57] A. Feldthaus, T. Millstein, A. Møller, M. Sch¨afer, F. Tip, Tool-supportedrefactoring for javascript, SIGPLAN Not. 46 (10) (2011) 119–138. doi:10.1145/2076021.2048078 .URL https://doi.org/10.1145/2076021.2048078 [58] F. Meawad, G. Richards, F. Morandat, J. Vitek, Eval begone! semi-automated removal of eval from javascript programs, SIGPLAN Not.47 (10) (2012) 607–620. doi:10.1145/2398857.2384660 .URL https://doi.org/10.1145/2398857.2384660 [59] S. H. Jensen, P. A. Jonsson, A. Møller, Remedying the eval that men do, in:Proceedings of the 2012 International Symposium on Software Testing andAnalysis, ISSTA 2012, Association for Computing Machinery, New York,NY, USA, 2012, pp. 34–44. doi:10.1145/2338965.2336758 .URL https://doi.org/10.1145/2338965.2336758 [60] F. Ocariza, K. Pattabiraman, A. Mesbah, Vejovis: Suggesting fixes forjavascript faults, in: Proceedings - International Conference on SoftwareEngineering, no. 1, Electrical and Computer Engineering, University ofBritish Columbia, 2014, pp. 837–847.URL http://login.ezproxy.lib.vt.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=edselc&AN=edselc.2-52.0-84993660437&site=eds-live&scope=site [61] M. Monperrus, A. Maia, Debugging with the Crowd: a Debug Recom-mendation System based on Stackoverflow, Research Report hal-00987395,Universit´e Lille 1 - Sciences et Technologies (2014).URL https://hal.archives-ouvertes.fr/hal-00987395 [62] M. Selakovic, M. Pradel, Poster: Automatically fixing real-world javascriptperformance bugs, 2015 ICSE International Conference on SoftwareEngineering. (2015) 811.URL http://search.ebscohost.com/login.aspx?direct=true&db=edb&AN=111750044&site=eds-live&scope=site.http://search.ebscohost.com/login.aspx?direct=true&db=edb&AN=111750044&site=eds-live&scope=site.